improver.calibration.ensemble_calibration module¶

This module defines all the “plugins” specific for Ensemble Model Output Statistics (EMOS).

Ensemble Model Output Statistics (EMOS)¶

Ensemble Model Output Statistics (EMOS), otherwise known as Nonhomogeneous Gaussian Regression (NGR), is a technique for calibrating an ensemble forecast.

Estimating EMOS coefficients¶

Following Gneiting et al., 2005, Ensemble Model Output Statistics for a normal distribution can be represented by the equation:

\[\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})\]

where the location parameter is a bias-corrected weighted average of the ensemble member forecasts, or alternatively, where the location parameter is a bias-corrected ensemble mean:

\[\mathcal{N}(a + b\bar{X}, c + dS^{2})\]

If a different distribution is required, for example, using a truncated normal distribution for wind speed, then the equations remain the same, apart from updating the distribution chosen. The preferred distribution will depend upon the variable being calibrated.

The a, b, c and d coefficients within the equations above are computed by minimising the Continuous Ranked Probability Score (CRPS), in terms of \(\alpha, \beta, \gamma\) and \(\delta\) (explained below). The distribution is accounted for through the formulation of the CRPS that is minimised e.g. please see Gneiting et al., 2005 for an example using a normal distribution and Thorarinsdottir and Gneiting, 2010 for an example using a truncated normal distribution.

Distribution description¶

A normal (Gaussian) distribution is often represented using the syntax:

\[\mathcal{N}(\mu,\,\sigma^{2})\]

where \(\mu\) is mean and \(\sigma^{2}\) is the variance. The normal distribution is a special case, where \(\mu\) can be interpreted as both the mean and the location parameter and \(\sigma^{2}\) can be interpreted as both the variance and the scale parameter. For an alternative distribution, such as a truncated normal distribution that has been truncated to lie within 0 and infinity, the distribution can be represented as:

\[\mathcal{N^0}(\mu,\,\sigma^{2})\]

In this case, the \(\mu\) is strictly interpreted as the location parameter and \(\sigma^{2}\) is strictly interpreted as the scale parameter.

What is the location parameter?¶

The location parameter indicates the shift in the distribution from the “centre” of the standard normal.

What is the scale parameter?¶

The scale parameter indicates the width in the distribution. If the scale parameter is large, then the distribution will be broader. If the scale is smaller, then the distribution will be narrower.

Estimating EMOS coefficients using the ensemble mean¶

If the predictor is the ensemble mean, coefficients are estimated as \(\alpha, \beta, \gamma\) and \(\delta\) based on the equation:

\[\mathcal{N}(a + \bar{X}, c + dS^{2})\]

where N is a chosen distribution and values of a, b, c and d are solved in the format of \(\alpha, \beta, \gamma\) and \(\delta\), see the equations below.

\[a = \alpha\]

\[b = \beta\]

\[c = \gamma^2\]

\[d = \delta^2\]

The \(\gamma\) and \(\delta\) values are squared to ensure c and d are positive and therefore more interpretable.

Estimating EMOS coefficients using the ensemble realizations¶

If the predictor is the ensemble realizations, coefficients are estimated for \(\alpha, \beta, \gamma\) and \(\delta\) based on the equation:

\[\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})\]

where N is a chosen distribution, the values of a, b, c and d relate to alpha, beta, gamma and delta through the equations above with the exception that \(b=\beta^2\), and the number of beta terms depends on the number of realizations provided. The beta, gamma, and delta values are squared to ensure that b, c and d are positive values and therefore are more easily interpretable. Specifically for the b term, the squaring ensures that the the b values can be interpreted as a weighting for each realization.

Applying EMOS coefficients¶

The EMOS coefficients represent adjustments to the ensemble mean and ensemble variance, in order to generate the location and scale parameters that, for the chosen distribution, minimise the CRPS. The coefficients can therefore be used to construct the location parameter, \(\mu\), and scale parameter, \(\sigma^{2}\), for the calibrated forecast from today’s ensemble mean, or ensemble realizations, and the ensemble variance.

\[ \begin{align}\begin{aligned}\mu = a + b\bar{X}\\\sigma^{2} = c + dS^{2}\end{aligned}\end{align} \]

Note here that this procedure holds whether the distribution is normal, i.e. where the application of the EMOS coefficients to the raw ensemble mean results in a calibrated location parameter that is equivalent to a calibrated ensemble mean (e.g. for screen temperature), and where the distribution is e.g. truncated normal (e.g. for wind speed). For a truncated normal distribution, the result of applying the EMOS coefficients to an uncalibrated forecast is a location parameter and scale parameter describing the calibrated distribution.

class improver.calibration.ensemble_calibration.ApplyCoefficientsFromEnsembleCalibration(predictor='mean')[source]¶

Bases: improver.BasePlugin

Class to apply the optimised EMOS coefficients to the current forecast.

__init__(predictor='mean')[source]¶

Create a plugin that uses the coefficients created using EMOS from historical forecasts and corresponding truths and applies these coefficients to the current forecast to generate a location and scale parameter that represents the calibrated distribution.

Parameters: predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

_abc_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache_version = 213¶

_abc_registry = <_weakrefset.WeakSet object>¶

_calculate_location_parameter_from_mean(optimised_coeffs)[source]¶

Function to calculate the location parameter when the ensemble mean at each grid point is the predictor.

Further information is available in the module level docstring.

Parameters: optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.
Returns: Location parameter calculated using the ensemble mean as the predictor.
Return type: numpy.ndarray

_calculate_location_parameter_from_realizations(optimised_coeffs)[source]¶

Function to calculate the location parameter when the ensemble realizations are the predictor.

Further information is available in the module level docstring.

Parameters: optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.
Returns: Location parameter calculated using the ensemble realizations as the predictor.
Return type: numpy.ndarray

_calculate_scale_parameter(optimised_coeffs)[source]¶

Calculation of the scale parameter using the ensemble variance adjusted using the gamma and delta coefficients calculated by EMOS.

Further information is available in the module level docstring.

Parameters: optimised_coeffs (dict) – A dictionary containing the calibration coefficient names as keys with their corresponding values.
Returns: Scale parameter for defining the distribution of the calibrated forecast.
Return type: numpy.ndarray

_create_output_cubes(location_parameter, scale_parameter)[source]¶

Creation of output cubes containing the location and scale parameters.

Parameters

location_parameter (numpy.ndarray) – Location parameter of the calibrated distribution.
scale_parameter (numpy.ndarray) – Scale parameter of the calibrated distribution.

Returns

tuple containing:

location_parameter_cube (iris.cube.Cube):: Location parameter of the calibrated distribution with associated metadata.
scale_parameter_cube (iris.cube.Cube):: Scale parameter of the calibrated distribution with associated metadata.

Return type

(tuple)

static _merge_calibrated_and_uncalibrated_regions(original_data, calibrated_data, mask)[source]¶

If a mask has been provided to this plugin, this function acts to combine calibrated data and uncalibrated data. Those regions where the mask=0 will be populated with uncalibrated data. Those regions where the mask=1 will retain calibrated data. The calibrated data cube will be modified in situ.

Note that this can be achieved straightforwardly with fancy indexing but there is a need to slice the data to avoid overflowing available memory.

Parameters

original_data (numpy.ndarray) – The uncalibrated predictor or variance that will populate regions in which the mask=0.
calibrated_data (numpy.ndarray) – The location parameter or scale parameter data array that will be modified in situ. Those regions of the array that correspond with indices at which the mask=0 will be replaced with data from the original_data array.
mask (numpy.ndarray) – A mask determining which regions should be returned with calibrated data (1) and which regions should be returned with uncalibrated data (0).

_spatial_domain_match()[source]¶

Check that the domain of the current forecast and coefficients cube match.

Raises: ValueError – If the domain information of the current_forecast and coefficients_cube do not match.

process(current_forecast, coefficients_cube, landsea_mask=None)[source]¶

Apply the EMOS coefficients to the current forecast, in order to generate location and scale parameters for creating the calibrated distribution.

Parameters

current_forecast (iris.cube.Cube) – The cube containing the current forecast.
coefficients_cube (iris.cube.Cube) – Cube containing the coefficients estimated using EMOS. The cube contains a coefficient_index dimension coordinate where the points of the coordinate are integer values and a coefficient_name auxiliary coordinate where the points of the coordinate are e.g. gamma, delta, alpha, beta.
landsea_mask (iris.cube.Cube or None) – The optional cube containing a land-sea mask. If provided, only land points are calibrated using the provided coefficients.

Returns

tuple containing:

location_parameter_cube (iris.cube.Cube):: Cube containing the location parameter of the calibrated distribution calculated using either the ensemble mean or the ensemble realizations. The location parameter represents the point at which a resulting PDF would be centred.
scale_parameter_cube (iris.cube.Cube):: Cube containing the scale parameter of the calibrated distribution calculated using either the ensemble mean or the ensemble realizations. The scale parameter represents the statistical dispersion of the resulting PDF, so a larger scale parameter will result in a broader PDF.

Return type

(tuple)

class improver.calibration.ensemble_calibration.ContinuousRankedProbabilityScoreMinimisers(tolerance=0.01, max_iterations=1000)[source]¶

Bases: object

Minimise the Continuous Ranked Probability Score (CRPS)

Calculate the optimised coefficients for minimising the CRPS based on assuming a particular probability distribution for the phenomenon being minimised.

The number of coefficients that will be optimised depend upon the initial guess.

Minimisation is performed using the Nelder-Mead algorithm for 200 iterations to limit the computational expense. Note that the BFGS algorithm was initially trialled but had a bug in comparison to comparative results generated in R.

BAD_VALUE = 999999.0¶

TOLERATED_PERCENTAGE_CHANGE = 5¶

__init__(tolerance=0.01, max_iterations=1000)[source]¶

Initialise class for performing minimisation of the Continuous Ranked Probability Score (CRPS).

Parameters

tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.
max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

calculate_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi, predictor)[source]¶

Calculate the CRPS for a normal distribution.

Scientific Reference: Gneiting, T. et al., 2005. Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review, 133(5), pp.1098-1118.

Parameters

initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].
forecast_predictor (numpy.ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations.
truth (numpy.ndarray) – Data to be used as truth.
forecast_var (numpy.ndarray) – Ensemble variance data.
sqrt_pi (numpy.ndarray) – Square root of Pi
predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

Returns

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

Return type

float

calculate_truncated_normal_crps(initial_guess, forecast_predictor, truth, forecast_var, sqrt_pi, predictor)[source]¶

Calculate the CRPS for a truncated normal distribution with zero as the lower bound.

Scientific Reference: Thorarinsdottir, T.L. & Gneiting, T., 2010. Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. Journal of the Royal Statistical Society. Series A: Statistics in Society, 173(2), pp.371-388.

Parameters

initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].
forecast_predictor (numpy.ndarray) – Data to be used as the predictor, either the ensemble mean or the ensemble realizations.
truth (numpy.ndarray) – Data to be used as truth.
forecast_var (numpy.ndarray) – Ensemble variance data.
sqrt_pi (numpy.ndarray) – Square root of Pi
predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.

Returns

CRPS for the current set of coefficients. This CRPS is a mean value across all points.

Return type

float

process(initial_guess, forecast_predictor, truth, forecast_var, predictor, distribution)[source]¶

Function to pass a given function to the scipy minimize function to estimate optimised values for the coefficients.

Further information is available in the module level docstring.

Parameters

initial_guess (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].
forecast_predictor (iris.cube.Cube) – Cube containing the fields to be used as the predictor, either the ensemble mean or the ensemble realizations.
truth (iris.cube.Cube) – Cube containing the field, which will be used as truth.
forecast_var (iris.cube.Cube) – Cube containing the field containing the ensemble variance.
predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.
distribution (str) – String used to access the appropriate function for use in the minimisation within self.minimisation_dict.

Returns

List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].

Return type

list of float

Raises

KeyError – If the distribution is not supported.

Warns

Warning – If the minimisation did not converge.

class improver.calibration.ensemble_calibration.EstimateCoefficientsForEnsembleCalibration(distribution, current_cycle, desired_units=None, predictor='mean', tolerance=0.01, max_iterations=1000)[source]¶

Bases: improver.BasePlugin

Class focussing on estimating the optimised coefficients for ensemble calibration.

ESTIMATE_COEFFICIENTS_FROM_LINEAR_MODEL_FLAG = True¶

__init__(distribution, current_cycle, desired_units=None, predictor='mean', tolerance=0.01, max_iterations=1000)[source]¶

Create an ensemble calibration plugin that, for Nonhomogeneous Gaussian Regression, calculates coefficients based on historical forecasts and applies the coefficients to the current forecast.

Further information is available in the module level docstring.

Parameters

distribution (str) – Name of distribution. Assume that a calibrated version of the current forecast could be represented using this distribution.
current_cycle (str) – The current cycle in YYYYMMDDTHHMMZ format e.g. 20171122T0100Z. This is used to create a forecast_reference_time coordinate on the resulting EMOS coefficients cube.
desired_units (str or cf_units.Unit) – The unit that you would like the calibration to be undertaken in. The current forecast, historical forecast and truth will be converted as required.
predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.
tolerance (float) – The tolerance for the Continuous Ranked Probability Score (CRPS) calculated by the minimisation. The CRPS is in the units of the variable being calibrated. The tolerance is therefore representative of how close to the actual value are we aiming to forecast for a particular variable. Once multiple iterations result in a CRPS equal to the same value within the specified tolerance, the minimisation will terminate.
max_iterations (int) – The maximum number of iterations allowed until the minimisation has converged to a stable solution. If the maximum number of iterations is reached, but the minimisation has not yet converged to a stable solution, then the available solution is used anyway, and a warning is raised. If the predictor_of_mean is “realizations”, then the number of iterations may require increasing, as there will be more coefficients to solve for.

Raises

ValueError – If the given distribution is not valid.

Warns

ImportWarning – If the statsmodels module can’t be imported.

_abc_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache_version = 213¶

_abc_registry = <_weakrefset.WeakSet object>¶

static _filter_non_matching_cubes(historic_forecast, truth)[source]¶

Provide filtering for the historic forecast and truth to make sure that these contain matching validity times. This ensures that any mismatch between the historic forecasts and truth is dealt with.

Parameters

historic_forecast (iris.cube.Cube) – Cube of historic forecasts that potentially contains a mismatch compared to the truth.
truth (iris.cube.Cube) – Cube of truth that potentially contains a mismatch compared to the historic forecasts.

Returns

tuple containing:

matching_historic_forecasts (iris.cube.Cube):: Cube of historic forecasts where any mismatches with the truth cube have been removed.
matching_truths (iris.cube.Cube):: Cube of truths where any mismatches with the historic_forecasts cube have been removed.

Return type

(tuple)

Raises

ValueError – The filtering has found no matches in validity time between the historic forecasts and the truths.

compute_initial_guess(truth, forecast_predictor, predictor, estimate_coefficients_from_linear_model_flag, no_of_realizations=None)[source]¶

Function to compute initial guess of the alpha, beta, gamma and delta components of the EMOS coefficients by linear regression of the forecast predictor and the truth, if requested. Otherwise, default values for the coefficients will be used.

If the predictor is “mean”, then the order of the initial_guess is [gamma, delta, alpha, beta]. Otherwise, if the predictor is “realizations” then the order of the initial_guess is [gamma, delta, alpha, beta0, beta1, beta2], where the number of beta variables will correspond to the number of realizations. In this example initial guess with three beta variables, there will correspondingly be three realizations.

The default values for the initial guesses are in [gamma, delta, alpha, beta] ordering:

For the ensemble mean, the default initial guess: [0, 1, 0, 1] assumes that the raw forecast is skilful and the expected adjustments are small.
For the ensemble realizations, the default initial guess is effectively: [0, 1, 0, 1/3., 1/3., 1/3.], such that each realization is assumed to have equal weight.

If linear regression is enabled, the alpha and beta coefficients associated with the ensemble mean or ensemble realizations are modified based on the results from the linear regression fit.

Parameters

truth (iris.cube.Cube) – Cube containing the field, which will be used as truth.
forecast_predictor (iris.cube.Cube) – Cube containing the fields to be used as the predictor, either the ensemble mean or the ensemble realizations.
predictor (str) – String to specify the form of the predictor used to calculate the location parameter when estimating the EMOS coefficients. Currently the ensemble mean (“mean”) and the ensemble realizations (“realizations”) are supported as the predictors.
estimate_coefficients_from_linear_model_flag (bool) – Flag whether coefficients should be estimated from the linear regression, or static estimates should be used.
no_of_realizations (int) – Number of realizations, if ensemble realizations are to be used as predictors. Default is None.

Returns

List of coefficients to be used as initial guess. Order of coefficients is [gamma, delta, alpha, beta].

Return type

list of float

create_coefficients_cube(optimised_coeffs, historic_forecast)[source]¶

Create a cube for storing the coefficients computed using EMOS.

Examples

For a cube containing coefficients calculated using Ensemble Model Output Statistics:

emos_coefficients / (1)             (coefficient_index: 4)
    Dimension coordinates:
         coefficient_index                           x
    Auxiliary coordinates:
         coefficient_name                            x
    Scalar coordinates:
         forecast_period: 14400 seconds
         forecast_reference_time: 2017-11-10 00:00:00
         time: 2017-11-10 04:00:00
    Attributes:
         diagnostic_standard_name: air_temperature
         mosg__model_configuration: uk_det

An example of the coefficient_index coordinate is:

DimCoord(array([0, 1, 2, 3]), standard_name=None, units=Unit('1'), long_name='coefficient_index')

An example of the coefficient_name coordinate is:

AuxCoord(array(['gamma', 'delta', 'alpha', 'beta'], dtype='<U5'), standard_name=None, units=Unit('no_unit'), long_name='coefficient_name')

Parameters

optimised_coeffs (list) – List of optimised coefficients. Order of coefficients is [gamma, delta, alpha, beta].
historic_forecast (iris.cube.Cube) – The cube containing the historic forecast.

Returns

Cube constructed using the coefficients provided and using metadata from the historic_forecast cube. The cube contains a coefficient_index dimension coordinate where the points of the coordinate are integer values and a coefficient_name auxiliary coordinate where the points of the coordinate are e.g. gamma, delta, alpha, beta.

Return type

iris.cube.Cube

Raises

ValueError – If the number of coefficients in the optimised_coeffs does not match the expected number.

static mask_cube(cube, landsea_mask)[source]¶

Mask the input cube using the given landsea_mask. Sea points are filled with nans and masked.

Parameters

cube (iris.cube.Cube) – A cube to be masked, on the same grid as the landsea_mask. The last two dimensions on this cube must match the dimensions in the landsea_mask cube.
landsea_mask (iris.cube.Cube) – A cube containing a land-sea mask. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Raises

IndexError – if the cube and landsea_mask shapes are not compatible.

process(historic_forecast, truth, landsea_mask=None)[source]¶

Using Nonhomogeneous Gaussian Regression/Ensemble Model Output Statistics, estimate the required coefficients from historical forecasts.

The main contents of this method is:

Check that the predictor is valid.
Filter the historic forecasts and truth to ensure that these inputs match in validity time.
Apply unit conversion to ensure that the historic forecasts and truth have the desired units for calibration.
Calculate the variance of the historic forecasts. If the chosen predictor is the mean, also calculate the mean of the historic forecasts.
If a land-sea mask is provided then mask out sea points in the truth and predictor from the historic forecasts.
Calculate initial guess at coefficient values by performing a linear regression, if requested, otherwise default values are used.
Perform minimisation.

Parameters

historic_forecast (iris.cube.Cube) – The cube containing the historical forecasts used for calibration.
truth (iris.cube.Cube) – The cube containing the truth used for calibration.
landsea_mask (iris.cube.Cube) – The optional cube containing a land-sea mask. If provided, only land points are used to calculate the coefficients. Within the land-sea mask cube land points should be specified as ones, and sea points as zeros.

Returns

Cube containing the coefficients estimated using EMOS. The cube contains a coefficient_index dimension coordinate and a coefficient_name auxiliary coordinate.

Return type

iris.cube.Cube

Raises

ValueError – If either the historic_forecast or truth cubes were not passed in.
ValueError – If the units of the historic and truth cubes do not match.