Uncertainty in drought indicators for parametric insurance

logo

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch

6.1. Uncertainty in drought indicators for parametric insurance#

Production date: 2026-02-24.

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch.

Dataset version: 1.0.

Produced by: Enis Gerxhalija, Olivier Burggraaff (National Physical Laboratory).

🌍 Use case: Parametric insurance for agriculture using reanalysis-based drought indicators#

❓ Quality assessment question#

What is the spread in drought indicators in the ERA5–Drought ensemble and how does this propagate into parametric insurance payouts?
Can the ensemble in ERA5–Drought provide additional information for parametric insurance, compared to only using the deterministic reanalysis?

Drought and wet periods have far-reaching environmental, societal, and economic impacts. In the United Kingdom, the record-breaking hot and dry spring and summer of 2025 caused harvest losses worth more than £800 million [ECIU+25]. An 18-month drought in Brazil in 2023–24, the most severe since monitoring began in 1954, led to 720 health centres in affected areas becoming non-operational [UNICEF+24]. Extreme rainfall killed hundreds of people and caused billions of € in damages in Spain in 2024 [Franch-Pardo+25]. Climate change is thought to be the primary driver behind the increase in drought and wet periods since the 1950s [IPCC+23], and this trend is expected to continue into the future.

To mitigate the economic impacts of drought events, agricultural companies and governments increasingly turn to parametric insurance policies, a type of data-driven insurance offering pre-set payouts when a trigger event happens [Lin+20]. Agricultural parametric insurance policies are based on variables including temperature, humidity, and precipitation, with risks estimated from typical climatology and crop growth models [Prokopchuk+20, World Bank Group+18].

For example, there are insurance products that offer payouts based on drought indicators such as the Standardised Precipitation Index (SPI) [McKee+93] and Standardised Precipitation-Evapotranspiration Index (SPEI) [Vicente-Serrano+10]. Both operate on the same principle, namely quantifying the amount of precipitation (and evapotranspiration for SPEI) over a given time frame at a given location relative to its historical climatology. For instance, an SPI value of –1 corresponds to a precipitation that is 1 standard deviation below the mean, for that site and time frame. This probabilistic approach lends itself to statements on the occurrence rate of extreme events [McKee+93]. As such, these indicators are suitable bases for agricultural parametric insurance products that pay out below a certain (aggregated) SPI or SPEI value. An example product proposed in [World Bank Group+18] is based on the 4-month sum of SPI-6, with SPI-6 an indicator of precipitation accumulated over 6-month periods, paying out when this aggregated indicator drops below –5 for a given year.

ECMWF now provide SPI and SPEI indices derived from their fifth-generation reanalysis ERA5, which assimilates meteorological data and models on a global ~31 km grid going back to 1940 [Soci+24, Hersbach+20], in the ERA5–Drought dataset [Keune+25], available from the Climate Data Store (CDS). ERA5 is a well-established dataset, widely used across many sectors including parametric insurance [Evenflow+24]; quality assessments for ERA5 itself can be found in Reanalysis. The derived dataset ERA5–Drought can be a valuable resource for applications in many sectors, since it can be used out of the box, freeing users from the need to find and process the underpinning meteorological data themselves. ERA5–Drought provides monthly SPI and SPEI for 7 different accumulation periods, interpolated to a 0.25° × 0.25° grid.

A key strength of ERA5 and ERA5–Drought is the inclusion of an ensemble that can be used to estimate (part of) the uncertainty in the data. Its 10 members are generated by running the ERA5 data assimilation and forecast model at a coarser resolution, 9 of them with perturbed inputs representing the input data uncertainty, and one control member with the same inputs as the higher-resolution deterministic reanalysis to provide a representative comparison. ECMWF’s ensemble system EDA is described in detail in [Isaksen+10]; its implementation for ERA5 in [Hersbach+20]. The latter writes that “[t]he ERA5 EDA spread among the ten ensemble members can be interpreted as a measure for the uncertainty in the [higher-resolution reanalysis] estimates” and “should mainly be used as a guide for the quality of representing the correct synoptic situation at a given time, rather than for long-term and/or large-scale averages.” Consequently, while the ensemble spread should not be considered an uncertainty in the metrological sense, it can provide useful information on part of the uncertainty in the data. In ERA5–Drought, SPI and SPEI introduce additional complexity by accumulating over time, regridding in space, and using a 30-year climatology reference window. For a metrological guide to uncertainty propagation with ensemble/Monte Carlo methods, we refer the reader to [JCGM+08].

This quality assessment examines the uncertainty in SPI and SPEI values from the ERA5–Drought (Monthly drought indices from 1940 to present derived from ERA5 reanalysis) dataset with a focus on parametric insurance for the agriculture sector. We investigate the ensemble spread in both indicators for one example site (Central Java, Indonesia) and propagate the reanalysis and ensemble into an example insurance product. In doing so, we test how well the ensemble and deterministic reanalysis agree, whether the ensemble spread is sufficiently small for these data to underpin an insurance product, and how the ensemble can provide additional information for creating and invoking insurance policies.

📢 Quality assessment statement#

These are the key outcomes of this assessment

The ensemble in ERA5–Drought provides useful information on the spread and thus uncertainty in SPI/SPEI drought indicators, which propagate into derived products such as parametric insurance payouts. The ensemble spread should be treated as a first-order estimate for the uncertainty, not a metrological uncertainty budget.
The ensemble and deterministic reanalysis generally agree well, but sometimes disagree, likely due to the difference in spatial resolution of the underpinning forecast model.
Following trends in ERA5, the spread in ERA5–Drought SPI/SPEI has decreased over time, and this is also true for derived products.
The behaviour of different ensemble members can be integrated into parametric insurance products, for example to provider smoother on-ramps.

📋 Methodology#

This quality assessment examines the uncertainty in SPI and SPEI values from the ERA5–Drought (Monthly drought indices from 1940 to present derived from ERA5 reanalysis) dataset with a focus on parametric insurance for the agriculture sector.

This notebook provides and runs through Python code for downloading the deterministic reanalysis and ensemble for SPI and SPEI from ERA5–Drought with the included quality flags, analysing the spread in the ensemble and its agreement with the deterministic reanalysis, and calculating indemnity payouts for an example parametric insurance product [World Bank Group+18]. This calculation is performed individually for each ensemble member, thus propagating the distribution of meteorological data (ERA5) through SPI/SPEI (ERA5–Drought) into the derived product (indemnity payout). As noted above, this is not a full metrological uncertainty analysis and should be considered a first-order estimate.

This notebook is set up so that the test site, dates, and definition of the insurance product are easily customised, so you can download it and apply it to your preferred domain.

1. Code setup

Import all required libraries.
Define helper functions.

2. Download data

Download SPI and SPEI from ERA5–Drought.
Download quality flags from ERA5–Drought.
Pre-process data and quality flags.

3. Time series analysis

Analyse SPI and SPEI time series with and without quality flags.

4. Parametric insurance

Define indemnity function.
Calculate indemnity based on ERA5–Drought data (deterministic reanalysis and ensemble).
Analyse differences and spread in indemnity payouts.

📈 Analysis and results#

1. Code setup#

Note

This notebook uses earthkit for downloading (earthkit-data) data. Because earthkit is in active development, some functionality may change after this notebook is published. If any part of the code stops functioning, please raise an issue on our GitHub repository so it can be fixed.

Import required libraries#

In this section, we import all the relevant packages needed for running the notebook.

Helper functions#

This section defines some functions and variables used in the following analysis, allowing code cells in later sections to be shorter and ensuring consistency.

Data downloading & (pre-)processing#

The following functions handle downloading data in specific circumstances, e.g. a geographical or temporal subset:

Categorising SPI and SPEI#

The following cell defines categories for SPI and SPEI values, e.g. “severe drought”:

Quality flags#

The following functions apply the quality flags included in ERA5–Drought, namely the probability of zero precipitation and the Shapiro–Wilk normality test:

Visualisation#

The following cell contains some base helper functions (e.g. displaying in Jupyter Notebook or Jupyter Book style, adding textboxes with consistent formatting, etc.):

The following functions display time series of the deterministic reanalysis and ensemble members:

Show code cell source

Hide code cell source

# Constants for visualisation
kwargs_reanalysis = {"color": "yellow",
                     "linewidth": 1,
                     "path_effects": [path_effects.Stroke(linewidth=2, foreground="black", alpha=0.7), path_effects.Normal(), ],
                    }
kwargs_ensemble_members = {"color": "black",
                           "alpha": 1,
                           "linewidth": 1,
                          }

# Plot members as individual lines
def plot_ensemble_members(data_ensemble: xr.Dataset, var: str, *,
                          data_reanalysis: Optional[xr.Dataset]=None,
                          time_dimension: str="time", ensemble_dimension: str="realization",
                          title: Optional[str]="",
                          glue_label: Optional[str]=None) -> None:
    """ Display ensemble members (and optionally deterministic reanalysis) time series for one variable `var`. """
    # Find indicator name and colours from `var`
    var_label = INDEXES_NAMED[var]
    indicator = var_label.split("-")[0]  # e.g. SPI-6 -> SPI
    var_categories = CATEGORIES[indicator]

    # Setup: Figure
    fig, ax = plt.subplots(figsize=(8, 4), layout="constrained")

    # Display individual members
    data_ensemble[var].plot.line(ax=ax, x=time_dimension, **kwargs_ensemble_members, add_legend=False)

    # Create dummy lines for legend
    lines_for_legend = [Line2D([0], [0], **kwargs_ensemble_members), ]
    labels_for_legend = ["Ensemble", ]

    # Optional: Display reanalysis
    if data_reanalysis:
        (line_reanalysis,) = data_reanalysis[var].plot.line(ax=ax, x=time_dimension, **kwargs_reanalysis)
        lines_for_legend.append(line_reanalysis)
        labels_for_legend.append("Deterministic reanalysis")

    # Display categories and 0 line
    plot_index_categories(ax, var_categories)
    plot_zero_line(ax)

    # Decorate figure
    ax.set_title(title)
    ax.set_xlim(tight_xlims_year(data_ensemble[time_dimension]))
    ax.set_ylim(-8, 8)
    ax.set_xlabel("Time")
    ax.set_ylabel(var_label)
    ax.legend(lines_for_legend, labels_for_legend, loc="lower right")

    # Show result
    _glue_or_show(fig, glue_label)

The following functions display aggregated SPI-6 and the resulting indemnity, for both the deterministic reanalysis and the ensemble:

Show code cell source

Hide code cell source

# Constants
COLOUR_INDEMNITY = "#AAC95E"

# Display indemnity and SPI
def plot_indemnity(indemnity_reanalysis: xr.DataArray, spi_reanalysis: xr.DataArray, *,
                   indemnity_ensemble: Optional[xr.DataArray]=None, spi_ensemble: Optional[xr.DataArray]=None,
                   time_dimension: str="year", ensemble_dimension: str="realization",
                   spi_trigger: Optional[float]=None,
                   title: Optional[str]="",
                   glue_label: Optional[str]=None) -> None:
    """
    In two panels, display the indemnity (top) and underlying aggregated SPI (bottom).
    Requires data from the deterministic reanalysis, optionally also displays ensemble.
    Ensemble indemnity is displayed as box plots showing the IQR (box), 5--95 CI (whiskers), and outliers.
    """
    # Note: There is some copypasting between this function and `plot_ensemble_members` which should ideally be refactored.
    # Setup: Figure
    fig, axs = plt.subplots(nrows=2, sharex=True, figsize=(8, 8), layout="constrained")
    (ax_indemnity, ax_spi) = axs  # Done in two steps so axs[-1] can still be accessed for labels etc

    ## INDEMNITY
    # Normalise to millions of USD
    indemnity_reanalysis = indemnity_reanalysis / 1e6
    if indemnity_ensemble is not None:
        indemnity_ensemble = indemnity_ensemble / 1e6

    # Display deterministic indemnity
    indemnity_reanalysis.plot.scatter(ax=ax_indemnity, x=time_dimension,
                                      s=10, marker="D", zorder=2.1, **kwargs_reanalysis,
                                      label="Deterministic reanalysis",)

    # Optional: Display ensemble indemnity
    if indemnity_ensemble is not None:
        # Prepare data in matplotlib's preferred format
        years = indemnity_ensemble[time_dimension].values
        box_data = [indemnity_ensemble.sel({time_dimension: y}).dropna(ensemble_dimension).values for y in years]

        # Display
        box_ensemble = ax_indemnity.boxplot(box_data, positions=years,
                                            vert=True, showfliers=False, patch_artist=True, manage_ticks=False,
                                            whis=(0, 100),
                                            boxprops=   {"facecolor": COLOUR_INDEMNITY, "edgecolor": "black",},
                                            medianprops={"color": "black", "zorder": 2,},
                                            flierprops= {"marker": "o", "markersize": 4,
                                                         "markerfacecolor": "none", "markeredgecolor": COLOUR_INDEMNITY,
                                                         "linestyle": "none",
                                                        },
                                            label="Ensemble",
                                           )

    # Decorate panel
    ax_indemnity.set_xlim(tight_xlims_year(indemnity_reanalysis[time_dimension]))
    ax_indemnity.set_ylim(0, 100)
    try:
        indemnity_units = indemnity_reanalysis.units
    except:
        indemnity_units = "USD"
    ax_indemnity.set_ylabel(f"Indemnity [million {indemnity_units}]")
    ax_indemnity.legend(loc="upper right")

    ## SPI
    spi_label = INDEXES_NAMED[spi_reanalysis.name]  # SPI-6

    # Display SPI
    (line_reanalysis,) = spi_reanalysis.plot.line(ax=ax_spi, x=time_dimension, zorder=2.1, **kwargs_reanalysis)

    # Create dummy lines for legend
    lines_for_legend = [line_reanalysis, ]
    labels_for_legend = ["Deterministic reanalysis", ]

    # Optional: Display ensemble SPI
    if spi_ensemble is not None:
        # Display individual ensemble members
        spi_ensemble.plot.line(ax=ax_spi, x=time_dimension, **kwargs_ensemble_members, add_legend=False)

        # Add dummy item to legend
        lines_for_legend.append(Line2D([0], [0], **kwargs_ensemble_members))
        labels_for_legend.append("Ensemble")

    # Decorate panel
    plot_zero_line(ax_spi)
    ax_spi.axhspan(-100, spi_trigger, facecolor=COLOUR_INDEMNITY, edgecolor="none", alpha=0.2)
    ax_spi.set_ylim(-20, 20)
    ax_spi.set_ylabel(f"Aggregated {spi_label}")
    ax_spi.legend(lines_for_legend, labels_for_legend, loc="lower right")

    # Decorate figure
    for ax in axs:
        ax.set_title("")
    fig.suptitle(title)
    axs[0].set_xlabel("")
    axs[-1].set_xlabel("Time")  # axs[-1] -> Always bottom panel, even if indemnity/spi panels are switched
    fig.align_ylabels()

    # Show result
    _glue_or_show(fig, glue_label)

ℹ️ If you want to know more#

Key resources#

The CDS catalogue entries for the data used were:

Monthly drought indices from 1940 to present derived from ERA5 reanalysis: derived-drought-historical-monthly

Code libraries used:

earthkit
- earthkit-data
- earthkit-plots

More about drought indicators:

More about parametric insurance:

More about ERA5:

More about uncertainty in ensemble reanalyses:

References#

[ECIU+25] Energy & Climate Intelligence Unit, ‘Estimated financial losses faced by UK farmers due dry weather impacts on key arable crops’, Energy & Climate Intelligence Unit, London, United Kingdom, Dec. 2025.

[Evenflow+24] Evenflow, ‘The value generated by ERA5’, Copernicus Climate Change Service (C3S), Bonn, Germany, Dec. 2024. [Online]. Available: https://climate.copernicus.eu/sites/default/files/2024-12/Value-generated-by-ERA5-full-report.pdf

[Franch-Pardo+25] I. Franch-Pardo, P. A. F. Puig, and A. Cerdà, ‘Geospatial Technologies in Crisis Response: Analyzing the 2024 Floods in Valencia, Spain’, European Journal of Geography, vol. 16, no. 2, pp. 286–297, Aug. 2025, doi: 10.48088/ejg.i.fra.16.2.286.297.

[Hersbach+20] H. Hersbach et al., ‘The ERA5 global reanalysis’, Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, May 2020, doi: 10.1002/qj.3803.

[IPCC+23] IPCC, ‘Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change’, Intergovernmental Panel on Climate Change (IPCC), Geneva, Switzerland, Jul. 2023. doi: 10.59327/IPCC/AR6-9789291691647.

[Isaksen+10] L. Isaksen et al., ‘Ensemble of data assimilations at ECMWF’, European Centre for Medium-Range Weather Forecasts, Reading, UK, 636, Dec. 2010. doi: 10.21957/obke4k60.

[JCGM+08] JCGM, ‘Evaluation of measurement data — Supplement 1 to the “Guide to the expression of uncertainty in measurement” — Propagation of distributions using a Monte Carlo method’, Joint Committee for Guides in Metrology, Sevres, Paris, France, 101:2008, 2008. doi: 10.59161/JCGM101-2008.

[Keune+25] J. Keune, F. Di Giuseppe, C. Barnard, E. Damasio da Costa, and F. Wetterhall, ‘ERA5–Drought: Global drought indices based on ECMWF reanalysis’, Scientific Data, vol. 12, p. 616, Apr. 2025, doi: 10.1038/s41597-025-04896-y.

[Lin+20] X. Lin and W. J. Kwon, ‘Application of parametric insurance in principle-compliant and innovative ways’, Risk Management and Insurance Review, vol. 23, no. 2, pp. 121–150, May 2020, doi: 10.1111/rmir.12146.

[McKee+93] T. B. McKee, N. J. Doesken, and J. Kleist, ‘The relationship of drought frequency and duration to time scales’, in Eighth Conference on Applied Climatology, Anaheim, California, USA, Jan. 1993.

[Prokopchuk+20] O. Prokopchuk, I. Prokopchuk, G. Mentel, and Y. Bilan, ‘Parametric Insurance as Innovative Development Factor of the Agricultural Sector of Economy’, AGRIS on-line Papers in Economics and Informatics, vol. 12, no. 3, pp. 69–86, Sep. 2020, doi: 10.22004/ag.econ.320076.

[Shapiro+65] S. S. Shapiro and M. B. Wilk, ‘An analysis of variance test for normality (complete samples)’, Biometrika, vol. 52, no. 3–4, pp. 591–611, Dec. 1965, doi: 10.1093/biomet/52.3-4.591.

[Soci+24] C. Soci et al., ‘The ERA5 global reanalysis from 1940 to 2022’, Quarterly Journal of the Royal Meteorological Society, vol. 150, no. 764, pp. 4014–4048, Jul. 2024, doi: 10.1002/qj.4803.

[UNICEF+24] UNICEF, ‘Latin America and Caribbean Region Flash Update No. 2 (Climate-related crisis in the Amazon Region)’, UNICEF, Nov. 2024.

[Vicente-Serrano+10] S. M. Vicente-Serrano, S. Beguería, and J. I. López-Moreno, ‘A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index’, Journal of Climate, vol. 23, no. 7, pp. 1696–1718, Apr. 2010, doi: 10.1175/2009JCLI2909.1.

[World Bank Group+18] World Bank Group, ‘Developing Parametric Insurance for Weather Related Risks for Indonesia’, World Bank, Washington, D.C., USA, Jan. 2018. doi: 10.1596/29784.

Uncertainty in drought indicators for parametric insurance

Contents

6.1. Uncertainty in drought indicators for parametric insurance#

🌍 Use case: Parametric insurance for agriculture using reanalysis-based drought indicators#

❓ Quality assessment question#

📢 Quality assessment statement#

📋 Methodology#

📈 Analysis and results#

1. Code setup#

Import required libraries#

Helper functions#

Data downloading & (pre-)processing#

Categorising SPI and SPEI#

Quality flags#

Visualisation#

2. Download data#

General setup#

Download SPI and SPEI#

Download quality flags#

3. Time series analysis#

4. Parametric insurance#

Calculate indemnity based on ERA5–Drought data#

ℹ️ If you want to know more#

Key resources#

References#