improver.ensemble_calibration.ensemble_calibration module

This module defines all the “plugins” specific for ensemble calibration.

class improver.ensemble_calibration.ensemble_calibration.ApplyCoefficientsFromEnsembleCalibration(predictor_of_mean_flag='mean')[source]

Bases: improver.BasePlugin

Class to apply the optimised EMOS coefficients to future dates.

__init__(predictor_of_mean_flag='mean')[source]

Create an ensemble calibration plugin that, for Nonhomogeneous Gaussian Regression, applies coefficients created using on historical forecasts and applies the coefficients to the current forecast.

Parameters

predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 213
_abc_registry = <_weakrefset.WeakSet object>
_get_calibrated_forecast_predictors_mean(optimised_coeffs)[source]

Function to get calibrated forecast_predictors when the predictor of mean used is the ensemble mean.

Parameters

optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.

Returns

tuple containing:
predicted_mean (numpy.ndarray):

Calibrated mean values in a flattened array.

forecast_predictor (iris.cube.Cube):

The forecast predictors, mean values taken by collapsing the realization coordinate.

Return type

(tuple)

_get_calibrated_forecast_predictors_realizations(optimised_coeffs, forecast_vars)[source]

Function to get calibrated forecast_predictors when the predictor of mean is the mean of each distinct realization. The domain mean in a given realization has been used to generate calibration coefficients, such that each realization can be calibrated separately. These calibrated realizations are then collapsed to give mean values at each point in the domain.

Parameters
  • optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.

  • forecast_vars (iris.cube.Cube) – A cube of forecast predictor variance calculated across realizations.

Returns

tuple containing:
predicted_mean (numpy.ndarray):

Calibrated mean values in a flattened array.

forecast_predictor (iris.cube.Cube):

The forecast predictors, mean values taken by collapsing the realization coordinate.

Return type

(tuple)

static _merge_calibrated_and_uncalibrated_regions(original_data, calibrated_data, mask)[source]

If a mask has been provided to this plugin, this function acts to combine calibrated data and uncalibrated data. Those regions where the mask=0 will be populated with uncalibrated data. Those regions where the mask=1 will retain calibrated data. The calibrated data cube will be modified in situ.

Note that this can be achieved straightforwardly with fancy indexing but there is a need to slice the data to avoid overflowing available memory.

Parameters
  • original_data (numpy.ndarray) – The uncalibrated predictor or variance that will populate regions in which the mask=0.

  • calibrated_data (numpy.ndarray) – The calibrated predictor or variance data array that will be modified in situ. Those regions of the array that correspond with indices at which the mask=0 will be replaced with data from the original_data array.

  • mask (numpy.ndarray) – A mask determining which regions should be returned with calibrated data (1) and which regions should be returned with uncalibrated data (0).

_spatial_domain_match()[source]

Check that the domain of the current forecast and coefficients cube match.

Raises

ValueError – If the domain information of the current_forecast and coefficients_cube do not match.

static calibrate_forecast_data(optimised_coeffs, predicted_mean, forecast_predictor, forecast_var)[source]

Create a calibrated_forecast_predictor by reshaping the predicted mean to the original domain dimensions. Apply the calibration coefficients to the forecast data variance. Return both to give calibrated mean and variance in the original domain dimensions.

Parameters
  • optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.

  • predicted_mean (numpy.ndarray) – Calibrated mean value.

  • forecast_predictor (iris.cube.Cube) – The forecast predictors, mean values taken by collapsing the realization coordinate.

  • forecast_var (iris.cube.Cube) – A cube of forecast predictor variance calculated across realizations.

Returns

tuple containing:
calibrated_forecast_predictor (iris.cube.Cube):

Cube containing the calibrated version of the ensemble predictor, either the ensemble mean or the ensemble realizations.

calibrated_forecast_var (iris.cube.Cube):

Cube containing the calibrated version of the ensemble variance, either the ensemble mean or the ensemble realizations.

Return type

(tuple)

process(current_forecast, coefficients_cube, landsea_mask=None)[source]

Wrapping function to calculate the forecast predictor and forecast variance prior to applying coefficients to the current forecast.

Parameters
  • current_forecast (iris.cube.Cube) – The cube containing the current forecast.

  • coefficients_cube (iris.cube.Cube) – Cube containing the coefficients estimated using EMOS. The cube contains a coefficient_index dimension coordinate where the points of the coordinate are integer values and a coefficient_name auxiliary coordinate where the points of the coordinate are e.g. gamma, delta, alpha, beta.

  • landsea_mask (iris.cube.Cube or None) – The optional cube containing a land-sea mask. If provided, only land points are calibrated using the provided coefficients.

Returns

tuple containing:
calibrated_forecast_predictor (iris.cube.Cube):

Cube containing the calibrated version of the ensemble predictor, either the ensemble mean or the ensemble realizations.

calibrated_forecast_variance (iris.cube.Cube):

Cube containing the calibrated version of the ensemble variance, either the ensemble mean or the ensemble realizations.

Return type

(tuple)

class improver.ensemble_calibration.ensemble_calibration.ContinuousRankedProbabilityScoreMinimisers(tolerance=0.01, max_iterations=1000)[source]

Bases: object

Minimise the Continuous Ranked Probability Score (CRPS)

Calculate the optimised coefficients for minimising the CRPS based on assuming a particular probability distribution for the phenomenon being minimised.

The number of coefficients that will be optimised depend upon the initial guess.

Minimisation is performed using the Nelder-Mead algorithm for 200 iterations to limit the computational expense. Note that the BFGS algorithm was initially trialled but had a bug in comparison to comparative results generated in R.

BAD_VALUE = 999999.0
TOLERATED_PERCENTAGE_CHANGE = 5
__init__(tolerance=0.01, max_iterations=1000)[source]

Initialise class for performing minimisation of the Continuous Ranked Probability Score (CRPS).

Parameters
  • tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.

  • max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

calculate_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi, predictor_of_mean_flag)[source]

Calculate the CRPS for a normal distribution.

Scientific Reference: Gneiting, T. et al., 2005. Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review, 133(5), pp.1098-1118.

Parameters
  • initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

  • forecast_predictor (numpy.ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations.

  • truth (numpy.ndarray) – Data to be used as truth.

  • forecast_var (numpy.ndarray) – Ensemble variance data.

  • sqrt_pi (numpy.ndarray) – Square root of Pi

  • predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

Returns

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

Return type

float

calculate_truncated_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi, predictor_of_mean_flag)[source]

Calculate the CRPS for a truncated normal distribution with zero as the lower bound.

Scientific Reference: Thorarinsdottir, T.L. & Gneiting, T., 2010. Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. Journal of the Royal Statistical Society. Series A: Statistics in Society, 173(2), pp.371-388.

Parameters
  • initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

  • forecast_predictor (numpy.ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations.

  • truth (numpy.ndarray) – Data to be used as truth.

  • forecast_var (numpy.ndarray) – Ensemble variance data.

  • sqrt_pi (numpy.ndarray) – Square root of Pi

  • predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

Returns

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

Return type

float

process(initial_guess, forecast_predictor, truth, forecast_var, predictor_of_mean_flag, distribution)[source]

Function to pass a given function to the scipy minimize function to estimate optimised values for the coefficients.

If the predictor_of_mean_flag is the ensemble mean, this function estimates values for alpha, beta, gamma and delta based on the equation: N(alpha + beta * ensemble_mean, gamma + delta * ensemble_variance), where N is a chosen distribution.

If the predictor_of_mean_flag is the ensemble realizations, this function estimates values for alpha, beta, gamma and delta based on the equation:

\[ \begin{align}\begin{aligned}N(alpha + beta0 * realization0 + beta1 * realization1,\\gamma + delta * ensemble\_variance)\end{aligned}\end{align} \]

where N is a chosen distribution and the number of beta terms depends on the number of realizations provided.

Parameters
  • initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

  • forecast_predictor (iris.cube.Cube) – Cube containing the fields to be used as the predictor, either the ensemble mean or the ensemble realizations.

  • truth (iris.cube.Cube) – Cube containing the field, which will be used as truth.

  • forecast_var (iris.cube.Cube) – Cube containg the field containing the ensemble variance.

  • predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • distribution (str) – String used to access the appropriate function for use in the minimisation within self.minimisation_dict.

Returns

List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

Return type

list of float

Raises

KeyError – If the distribution is not supported.

Warns

Warning – If the minimisation did not converge.

class improver.ensemble_calibration.ensemble_calibration.EstimateCoefficientsForEnsembleCalibration(distribution, current_cycle, desired_units=None, predictor_of_mean_flag='mean', tolerance=0.01, max_iterations=1000)[source]

Bases: improver.BasePlugin

Class focussing on estimating the optimised coefficients for ensemble calibration.

ESTIMATE_COEFFICIENTS_FROM_LINEAR_MODEL_FLAG = True
__init__(distribution, current_cycle, desired_units=None, predictor_of_mean_flag='mean', tolerance=0.01, max_iterations=1000)[source]

Create an ensemble calibration plugin that, for Nonhomogeneous Gaussian Regression, calculates coefficients based on historical forecasts and applies the coefficients to the current forecast.

Parameters
  • distribution (str) – Name of distribution. Assume that the current forecast can be represented using this distribution.

  • current_cycle (str) – The current cycle in YYYYMMDDTHHMMZ format e.g. 20171122T0100Z. This is used to create a forecast_reference_time coordinate on the resulting EMOS coefficients cube.

  • desired_units (str or cf_units.Unit) – The unit that you would like the calibration to be undertaken in. The current forecast, historical forecast and truth will be converted as required.

  • predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.

  • max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

Raises

ValueError – If the given distribution is not valid.

Warns

ImportWarning – If the statsmodels module can’t be imported.

_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 213
_abc_registry = <_weakrefset.WeakSet object>
static _filter_non_matching_cubes(historic_forecast, truth)[source]

Provide filtering for the historic forecast and truth to make sure that these contain matching validity times. This ensures that any mismatch between the historic forecasts and truth is dealt with.

Parameters
  • historic_forecast (iris.cube.Cube) – Cube of historic forecasts that potentially contains a mismatch compared to the truth.

  • truth (iris.cube.Cube) – Cube of truth that potentially contains a mismatch compared to the historic forecasts.

Returns

tuple containing:
matching_historic_forecasts (iris.cube.Cube):

Cube of historic forecasts where any mismatches with the truth cube have been removed.

matching_truths (iris.cube.Cube):

Cube of truths where any mismatches with the historic_forecasts cube have been removed.

Return type

(tuple)

Raises

ValueError – The filtering has found no matches in validity time between the historic forecasts and the truths.

compute_initial_guess(truth, forecast_predictor, predictor_of_mean_flag, estimate_coefficients_from_linear_model_flag, no_of_realizations=None)[source]

Function to compute initial guess of the alpha, beta, gamma and delta components of the EMOS coefficients by linear regression of the forecast predictor and the truth, if requested. Otherwise, default values for the coefficients will be used.

If the predictor_of_mean_flag is “mean”, then the order of the initial_guess is [gamma, delta, alpha, beta]. Otherwise, if the predictor_of_mean_flag is “realizations” then the order of the initial_guess is [gamma, delta, alpha, beta0, beta1, beta2], where the number of beta variables will correspond to the number of realizations. In this example initial guess with three beta variables, there will correspondingly be three realizations.

The coefficients relate to adjustments to the ensemble mean or the ensemble realizations, and adjustments to the ensemble variance:

\[alpha + beta * ensemble\_mean\]

or

\[alpha + beta0 * realization1 + beta1 * realization2\]
\[gamma + delta * ensemble\_variance\]

The default values for the initial guesses are in [gamma, delta, alpha, beta] ordering: * For the ensemble mean, the default initial guess: [0, 1, 0, 1] assumes that the raw forecast is skilful and the expected adjustments are small. * For the ensemble realizations, the default initial guess is effectively: [0, 1, 0, 1/3., 1/3., 1/3.], such that each realization is assumed to have equal weight.

If linear regression is enabled, the alpha and beta coefficients associated with the ensemble mean or ensemble realizations are modified based on the results from the linear regression fit.

Parameters
  • truth (iris.cube.Cube) – Cube containing the field, which will be used as truth.

  • forecast_predictor (iris.cube.Cube) – Cube containing the fields to be used as the predictor, either the ensemble mean or the ensemble realizations.

  • predictor_of_mean_flag (str) – String to specify the input to calculate the calibrated mean. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

  • estimate_coefficients_from_linear_model_flag (bool) – Flag whether coefficients should be estimated from the linear regression, or static estimates should be used.

  • no_of_realizations (int) – Number of realizations, if ensemble realizations are to be used as predictors. Default is None.

Returns

List of coefficients to be used as initial guess. Order of coefficients is [gamma, delta, alpha, beta].

Return type

list of float

create_coefficients_cube(optimised_coeffs, historic_forecast)[source]

Create a cube for storing the coefficients computed using EMOS.

Examples

For a cube containing coefficients calculated using Ensemble Model Output Statistics:

emos_coefficients / (1)             (coefficient_index: 4)
    Dimension coordinates:
         coefficient_index                           x
    Auxiliary coordinates:
         coefficient_name                            x
    Scalar coordinates:
         forecast_period: 14400 seconds
         forecast_reference_time: 2017-11-10 00:00:00
         time: 2017-11-10 04:00:00
    Attributes:
         diagnostic_standard_name: air_temperature
         mosg__model_configuration: uk_det

An example of the coefficient_index coordinate is:

DimCoord(array([0, 1, 2, 3]), standard_name=None, units=Unit('1'), long_name='coefficient_index')

An example of the coefficient_name coordinate is:

AuxCoord(array(['gamma', 'delta', 'alpha', 'beta'], dtype='<U5'), standard_name=None, units=Unit('no_unit'), long_name='coefficient_name')
Parameters
  • optimised_coeffs (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

  • historic_forecast (iris.cube.Cube) – The cube containing the historic forecast.

Returns

Cube constructed using the coefficients provided and using metadata from the historic_forecast cube. The cube contains a coefficient_index dimension coordinate where the points of the coordinate are integer values and a coefficient_name auxiliary coordinate where the points of the coordinate are e.g. gamma, delta, alpha, beta.

Return type

iris.cube.Cube

Raises

ValueError – If the number of coefficients in the optimised_coeffs does not match the expected number.

static mask_cube(cube, landsea_mask)[source]

Mask the input cube using the given landsea_mask. Sea points are filled with nans and masked.

Parameters
  • cube (iris.cube.Cube) – A cube to be masked, on the same grid as the landsea_mask. The last two dimensions on this cube must match the dimensions in the landsea_mask cube.

  • landsea_mask (iris.cube.Cube) – A cube containing a land-sea mask. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Raises

IndexError – if the cube and landsea_mask shapes are not compatible.

process(historic_forecast, truth, landsea_mask=None)[source]

Using Nonhomogeneous Gaussian Regression/Ensemble Model Output Statistics, estimate the required coefficients from historical forecasts.

The main contents of this method is:

  1. Check that the predictor_of_mean_flag is valid.

  2. Filter the historic forecasts and truth to ensure that these inputs match in validity time.

  3. Apply unit conversion to ensure that the historic forecasts and truth have the desired units for calibration.

  4. Calculate the variance of the historic forecasts. If the chosen predictor is the mean, also calculate the mean of the historic forecasts.

  5. If a land-sea mask is provided then mask out sea points in the truth and predictor from the historic forecasts.

  6. Calculate initial guess at coefficient values by performing a linear regression, if requested, otherwise default values are used.

  7. Perform minimisation.

Parameters
  • historic_forecast (iris.cube.Cube) – The cube containing the historical forecasts used for calibration.

  • truth (iris.cube.Cube) – The cube containing the truth used for calibration.

  • landsea_mask (iris.cube.Cube) – The optional cube containing a land-sea mask. If provided, only land points are used to calculate the coefficients. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Returns

Cube containing the coefficients estimated using EMOS. The cube contains a coefficient_index dimension coordinate and a coefficient_name auxiliary coordinate.

Return type

iris.cube.Cube

Raises

ValueError – If the units of the historic and truth cubes do not match.