logo

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch

6.1. Uncertainty in drought indicators for national monitoring#

Production date: 2026-02-26.

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch.

Dataset version: 1.0.

Produced by: Enis Gerxhalija, Olivier Burggraaff (National Physical Laboratory).

🌍 Use case: Monitoring drought on a national scale using reanalysis-based drought indicators#

❓ Quality assessment question#

  • What is the ensemble spread for drought indicators in the ERA5–Drought dataset and how does this propagate into drought monitoring levels?

  • Can the ensemble in ERA5–Drought provide additional information for operational drought monitoring, compared to only using the deterministic reanalysis?

Drought and wet periods have far-reaching environmental, societal, and economic impacts. In the United Kingdom, the record-breaking hot and dry spring and summer of 2025 caused harvest losses worth more than £800 million [ECIU+25]. An 18-month drought in Brazil in 2023–24, the most severe since monitoring began in 1954, led to 720 health centres in affected areas becoming non-operational [UNICEF+24]. Extreme rainfall killed hundreds of people and caused billions of € in damages in Spain in 2024 [Franch-Pardo+25]. Climate change is thought to be the primary driver behind the increase in drought and wet periods since the 1950s [IPCC+23], and this trend is expected to continue into the future.

Given these impacts, monitoring drought and wet periods is vital. Several quantitative proxies have been developed to aid in this monitoring, generally based on a combination of total precipitation, soil moisture, evapotranspiration, and surface (air) temperature. Two widely-employed indices are the Standardised Precipitation Index (SPI) [McKee+93] and Standardised Precipitation-Evapotranspiration Index (SPEI) [Vicente-Serrano+10]. Both operate on the same principle, namely quantifying the amount of precipitation over a given time frame at a given location relative to its historical climatology. For example, an SPI value of –1 corresponds to a precipitation that is 1 standard deviation below the mean, for that site and time frame. This probabilistic approach lends itself to statements on the occurrence rate of extreme events [McKee+93]. SPEI expands on SPI by including not just gains from precipitation, but also losses due to evapotranspiration. SPI and SPEI can be evaluated over different accumulation periods to probe phenomena at different time scales [Keune+25, EDO+25]:

  • 1, 3 months: soil moisture, flow in smaller creeks.

  • 6, 12 months: reservoir storage, stream flow.

  • 24, 36, 48 months: reservoir and groundwater recharge.

ECMWF now provide SPI and SPEI indices derived from their fifth-generation reanalysis ERA5, which assimilates meteorological data and models on a global ~31 km grid going back to 1940 [Soci+24, Hersbach+20], in the ERA5–Drought dataset [Keune+25], available from the Climate Data Store (CDS). ERA5 is a well-established dataset, widely used across many sectors including drought monitoring and forecasting [Evenflow+24]; quality assessments for ERA5 itself can be found in Reanalysis. The derived dataset ERA5–Drought can be a valuable resource for applications in many sectors, since it can be used out of the box, freeing users from the need to find and process the underpinning meteorological data themselves. ERA5–Drought provides monthly SPI and SPEI for 7 different accumulation periods, interpolated to a 0.25° × 0.25° grid.

A key strength of ERA5 and ERA5–Drought is the inclusion of an ensemble that can be used to estimate (part of) the uncertainty in the data. Its 10 members are generated by running the ERA5 data assimilation and forecast model at a coarser resolution, 9 of them with perturbed inputs representing the input data uncertainty, and one control member with the same inputs as the higher-resolution deterministic reanalysis to provide a representative comparison. ECMWF’s ensemble system EDA is described in detail in [Isaksen+10]; its implementation for ERA5 in [Hersbach+20]. The latter writes that “[t]he ERA5 EDA spread among the ten ensemble members can be interpreted as a measure for the uncertainty in the [higher-resolution reanalysis] estimates” and “should mainly be used as a guide for the quality of representing the correct synoptic situation at a given time, rather than for long-term and/or large-scale averages.” Consequently, while the ensemble spread should not be considered an uncertainty in the metrological sense, it can provide useful information on part of the uncertainty in the data. In ERA5–Drought, SPI and SPEI introduce additional complexity by accumulating over time, regridding in space, and using a 30-year climatology reference window. For a metrological guide to uncertainty propagation with ensemble/Monte Carlo methods, we refer the reader to [JCGM+08].

This quality assessment examines the uncertainty in SPI and SPEI values from the ERA5–Drought (Monthly drought indices from 1940 to present derived from ERA5 reanalysis) dataset with a focus on national drought monitoring. We investigate the ensemble spread in SPI and SPEI across one country (Australia) as well as the agreement between the deterministic reanalysis and individual ensemble members in terms of declaring drought states (such as “extremely dry”). In doing so, we assess whether the ensemble spread is sufficiently small for these data to underpin a drought monitoring programme, and how the ensemble can provide additional information for such programmes.

📢 Quality assessment statement#

These are the key outcomes of this assessment

  • The ensemble in ERA5–Drought provides useful information on the spread and thus uncertainty in SPI/SPEI drought indicators. The ensemble spread should be treated as a first-order estimate for the uncertainty, not a metrological uncertainty budget.

  • The ensemble spread in drought indicators is generally small (≤0.24 for SPI and ≤0.15 for SPEI in this case study) for pixels that comply with the included quality flags.

  • In most cases, the majority of ensemble members agree with the deterministic reanalysis in declaring a drought level. Therefore, ERA5–Drought can be used confidently for monitoring drought. The ensemble can be used to measure said confidence.

  • Disagreements between ensemble members and the deterministic reanalysis are the result of propagated uncertainty from ERA5 combined with methodological factors, most prominently the difference in spatial resolution between ensemble and deterministic reanalysis.

  • The ensemble provides new opportunities for more nuanced monitoring of drought, compared to a binary classification based on the deterministic reanalysis alone. For example, triggers could be set based on the number of ensemble members that pass an SPI or SPEI threshold to capture a wider and smoother area of potential drought and provide early warnings.

📋 Methodology#

This quality assessment examines the uncertainty in SPI and SPEI values from the ERA5–Drought (Monthly drought indices from 1940 to present derived from ERA5 reanalysis) dataset with a focus on national drought monitoring.

This notebook provides and runs through Python code for downloading the deterministic reanalysis and ensemble for SPI and SPEI from ERA5–Drought with the included quality flags, analysing the spread in the ensemble and its agreement with the deterministic reanalysis, and assigning drought warning levels (e.g. “severe drought”) across a country (here Australia) during a known drought event. This calculation is performed individually for each ensemble member, thus propagating the distribution of meteorological data (ERA5) through SPI/SPEI (ERA5–Drought) into the derived product (drought level). As noted above, this is not a full metrological uncertainty analysis and should be considered a first-order estimate.

This notebook is set up so that the test country and dates are easily customised, so you can download it and apply it to your preferred domain.

1. Code setup

  • Import all required libraries.

  • Define helper functions.

2. Download data

  • Download SPI and SPEI from ERA5–Drought.

  • Download quality flags from ERA5–Drought.

  • Pre-process data.

3. Drought indicators

  • Analyse SPI and SPEI geospatially with and without quality flags.

4. Drought classification

  • Assign drought levels to the deterministic reanalysis and individual ensemble members.

  • Assess the uncertainty in drought levels by comparing the reanalysis and ensemble.

📈 Analysis and results#

1. Code setup#

Note

This notebook uses earthkit for downloading (earthkit-data) and visualising (earthkit-plots) data. Because earthkit is in active development, some functionality may change after this notebook is published. If any part of the code stops functioning, please raise an issue on our GitHub repository so it can be fixed.

Import required libraries#

In this section, we import all the relevant packages needed for running the notebook.

Hide code cell source

# Input / Output
import earthkit.data as ekd

# General data handling
import numpy as np
import pandas as pd
import xarray as xr

# Visualisation
import earthkit.plots as ekp
from earthkit.plots.styles import Style
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm, ListedColormap, LinearSegmentedColormap
plt.rcParams["grid.linestyle"] = "--"
from IPython.display import display_html  # Display DataFrames with nicer formatting in notebooks

# Visualisation in Jupyter book -- automatically ignored otherwise
try:
    from myst_nb import glue
except ImportError:
    glue = None

# Type hints
from typing import Any, Iterable, Optional
from pandas.io.formats.style import Styler as pdStyler
from earthkit.plots.geo.domains import Domain
AnyDomain = (Domain | str)

Helper functions#

This section defines some functions and variables used in the following analysis, allowing code cells in later sections to be shorter and ensuring consistency.

Data (pre-)processing#

The following functions handle downloading data in specific circumstances, e.g. a geographical or temporal subset:

Hide code cell source

# Constants
MONTHS = range(1, 13)  # January to December (inclusive)
ENSEMBLE_DIMENSION = "realization"

# Download within a spatial domain
def domain_to_request(domain: ekp.geo.domains.Domain) -> dict:
    """ From an earthkit-plots domain, generate a request for earthkit-data / cdsapi. """
    bbox = domain.bbox.to_latlon_bbox()

    # Round
    north = int(np.ceil(bbox.north) + 1)
    south = int(np.floor(bbox.south) - 1)
    west = int(np.floor(bbox.west) - 1)
    east = int(np.ceil(bbox.east) + 1)
    
    area = [north, west, south, east]
    return {"area": area}

The following functions aid in processing quality flag data from ERA5–Drought into a standard format:

Hide code cell source

def preprocess_era5drought_qualityflag(data: dict[Any, ekd.FieldList], variable_out: Optional[str]=None, *,
                                       month_dimension: str="month", time_dimension: str="time") -> xr.Dataset:
    """ 
    Turn a dict of earthkit FieldLists with ERA5–Drought quality flags, organised by accumulation period, into one big xr.Dataset.
    Rename the variables according to the dict keys; optionally using `variable_out` (e.g. SPEI -> SPEI1, SPEI3, etc).
    Convert a 12-month time dimension (2020-01-01, 2020-02-01, ...) into a month dimension (1, 2, ...).
    """
    # Convert field lists to xarray Datasets
    data = {key: ds.to_xarray(compat="equals")
            for key, ds in data.items()}

    # Rename variables to include keys (e.g. accumulation periods)
    first_fieldlist = next(iter(data.values()))
    data_var = next(iter(first_fieldlist.data_vars))
    if not variable_out:  # If not manually specified, append keys to existing variable name
        variable_out = data_var
    
    data = [ds.rename_vars({data_var: f"{variable_out}{key}"})
            for key, ds in data.items()]

    # Merge into one
    data = xr.merge(data, compat="equals")

    # # Change time dimension to months
    # data = data.assign_coords({month_dimension: data[time_dimension].dt.month})
    # data = data.swap_dims({time_dimension: month_dimension})
    
    return data
Categorising SPI and SPEI#

The following cell defines categories for SPI and SPEI values, e.g. “severe drought”.

Hide code cell source

CATEGORIES_SPI = [  # Approximates the GDO scheme
    (   2.00, 100,     "Extremely wet",            "#7a007b"),  # deepest purple 
    (   1.50,   2.00,  "Severely wet",             "#af51c3"),  # dark purple
    (   1.00,   1.50,  "Moderately wet",           "#eaccf8"),  # medium purple
    (   0.00,   1.00,  "Near‑normal / mildly wet", "#ffffff"),  # white
    (  -1.00,   0.00,  "Near‑normal / mildly dry", "#ffffff"),  # white
    (  -1.50,  -1.00,  "Moderately dry",           "#fffc03"),  # yellow
    (  -2.00,  -1.50,  "Severely dry",             "#feaa00"),  # orange
    (-100,     -2.00,  "Extremely dry",            "#ff0100"),  # red
]

CATEGORIES_SPEI = [
    (   2.33, 100,    "Extremely wet",  "#01148b"),  # very dark navy
    (   1.65,   2.33, "Severely wet",   "#1871de"),  # strong blue
    (   1.28,   1.65, "Moderately wet", "#14acf4"),  # medium blue
    (   0.84,   1.28, "Mildly wet",     "#00f2fe"),  # cyan
    (  -0.84,   0.84, "Near-normal",    "#9afa93"),  # light green
    (  -1.28,  -0.84, "Mildly dry",     "#fdc403"),  # yellow
    (  -1.65,  -1.28, "Moderately dry", "#f2631d"),  # orange
    (  -2.33,  -1.65, "Severely dry",   "#df2929"),  # red
    (-100,     -2.33, "Extremely dry",  "#8c1b1a"),  # dark red
]

# ERA5–Drought provides data in former format (SPI6)
# But figures should be labelled in the latter (SPI-6)
INDEXES_NAMED = {f"{index}{period}": f"{index}-{period}"  # Mapping from e.g. SPI1 to SPI-1
                 for index in ["SPI", "SPEI"]             # ERA5–Drought provides data in former format (SPI1)
                 for period in range(1, 49)               # But figures should be labelled in the latter (SPI-1)
}
Quality flags#

The following functions apply the quality flags included in ERA5–Drought, such as the probability of zero precipitation and the Shapiro–Wilk normality test:

Hide code cell source

# Constants
P0_THRESHOLD = 0.1  # Threshold for masking based on p0 (probability of zero precipitation); from Keune+25

# Functions
def combine_ensemble_mask(data: xr.Dataset, *, ensemble_dimension: str=ENSEMBLE_DIMENSION) -> xr.Dataset:
    """
    Reduce an ensemble mask along the member (default: ENSEMBLE_DIMENSION, "realization") dimension,
    masking pixels where _any_ member is False.
    """
    return data.all(dim=ensemble_dimension)

def apply_quality_flags(data: xr.DataArray, *masks: xr.DataArray,
                        time_dimension: str="time") -> xr.DataArray:
    """
    Apply any number of monthly quality flag `masks` to the data array `data` (e.g. SPI6).
    Flagged values are set to NaN.
    """
    # 12‑month mask -> Month dimension
    masks = [mask.assign_coords(
                 month=(time_dimension, mask[time_dimension].dt.month.data)
            ).swap_dims({time_dimension: "month"}).drop_vars(time_dimension) for mask in masks]

    # Add month coordinate to `data`
    data = data.assign_coords(month=data[time_dimension].dt.month)

    # Broadcast masks to full time span
    masks = [mask.sel(month=data["month"]) for mask in masks]

    # Apply masks and return result
    for mask in masks:
        data = data.where(mask)
    return data
Statistics#

The following functions perform statistics on the ensemble datasets, both calculating the mean and standard deviation in ensemble members, and the drought severity category they fall in.

Hide code cell source

def ensemble_mean_and_spread(data: xr.Dataset, *, ensemble_dimension: str=ENSEMBLE_DIMENSION) -> tuple[xr.Dataset, xr.Dataset]:
    """ Calculate the mean and spread within an ensemble along the ENSEMBLE_DIMENSION. """
    # Calculate mean
    mean = data.mean(ensemble_dimension)

    # Calculate spread - Standard deviation with 0 degrees of freedom (i.e. divide by 10)
    # https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Ensemblemeansandstandarddeviations
    spread = data.std(ensemble_dimension, ddof=0)

    return mean, spread

The following functions count the number of (mis)matches between the deterministic reanalysis and invididual ensemble members and tabulate the result:

Hide code cell source

# Helper function
def count_how_often_met(counts: xr.Dataset, thresholds: Iterable[int]=range(1, 11)) -> pd.DataFrame:
    """
    Given a masked dataset of
    "how often does the ensemble meet a threshold if the reanalysis meets/doesn't meet the threshold"
    tabulate how often this is 0, >= 1, ...
    """
    # Columns in table: How often do n==0, >=1, ..., >=10 members meet the threshold?
    count_0 = {0: (counts == 0).sum()}  # 0 has to be done separately because it's ==, not >=
    count_n = {n: (counts >= n).sum() for n in thresholds}
    count_n = count_0 | count_n  # Combine into one dict

    # Normalise by total number of pixels under consideration (in %)
    count_n = {n: count / counts.count() * 100 for n, count in count_n.items()}

    # Combine into DataFrame
    count_n = {n: count.to_pandas() for n, count in count_n.items()}
    count_n = pd.DataFrame(count_n)

    # Nicer column labels
    column_names = {0: "n = 0"} | {n: f"n ≥ {n}" for n in thresholds}
    count_n = count_n.rename(columns=column_names)

    return count_n

# Helper function
def fraction_match_mismatch(data_reanalysis: xr.Dataset, data_ensemble: xr.Dataset, threshold: float, *,
                            category_label: str="in <drought category>",
                            ensemble_dimension: str=ENSEMBLE_DIMENSION, **kwargs) -> pd.DataFrame:
    """
    Given a reanalysis and ensemble dataset, count how many ensemble members meet the `threshold`
    (e.g. SPI is below the upper limit for "severe drought")
    for pixels where the reanalysis does / doesn't meet the same threshold.
    Applies to one category (threshold + label).
    """
    # Find pixels where the reanalysis meets the threshold (e.g. is at least moderate drought) and doesn't (e.g. is not moderate drought)
    reanalysis_meets_threshold       = (data_reanalysis <  threshold)
    reanalysis_doesnt_meet_threshold = (data_reanalysis >= threshold)  # Don't invert the mask because of NaNs

    # Count the number of ensemble members that meet the threshold (e.g. are at least moderate drought)
    ensemble_meets_threshold = (data_ensemble < threshold)
    ensemble_counts = ensemble_meets_threshold.sum(dim=ensemble_dimension)  # Number of members in category
    ensemble_counts = ensemble_counts.where(data_ensemble.sel({ensemble_dimension: 1}).notnull())  # Re-apply NaN mask

    # Match-up: Count the number of ensemble members that meet the threshold in the same pixel where the reanalysis meets/doesn't meet it
    # Basically match and mismatch, but with number of ensemble members instead of True/False
    ensemble_counts_where_reanalysis_meets_threshold       = ensemble_counts.where(reanalysis_meets_threshold)
    ensemble_counts_where_reanalysis_doesnt_meet_threshold = ensemble_counts.where(reanalysis_doesnt_meet_threshold)

    # Columns in table: How often do n==0, >=1, ..., >=10 members meet the threshold?
    meet_counts        = count_how_often_met(ensemble_counts_where_reanalysis_meets_threshold)
    doesnt_meet_counts = count_how_often_met(ensemble_counts_where_reanalysis_doesnt_meet_threshold)

    return meet_counts, doesnt_meet_counts

# Helper function
def fraction_match_mismatch_with_formatting(data_reanalysis: xr.Dataset, data_ensemble: xr.Dataset,
                                        indicator: str, categories: Iterable[Iterable], *,
                                        ensemble_dimension: str=ENSEMBLE_DIMENSION) -> tuple[pd.DataFrame]:
    # Apply fraction_match_mismatch to this indicator for each drought level
    match, mismatch = zip(*[fraction_match_mismatch(data_reanalysis[[indicator]], data_ensemble[[indicator]],
                                                    upper_lim, category_label=category_label,
                                                    ensemble_dimension=ensemble_dimension)
                            for (lower_lim, upper_lim, category_label, category_colour) in categories])

    # Apply labels
    match, mismatch = pd.concat(match), pd.concat(mismatch)
    match.index    = [f"Reanalysis is {category_label} or drier"
                      for (lower_lim, upper_lim, category_label, category_colour) in categories]
    mismatch.index = [f"Reanalysis is less dry than {category_label}"
                      for (lower_lim, upper_lim, category_label, category_colour) in categories]

    return match, mismatch

# Helper function
def display_fraction_match_mismatch_with_formatting(df: pd.DataFrame, caption: str) -> None:
    display_html(df.style.set_caption(caption).format(precision=1))


# Main function: Display the fraction of matching / mismatching match-ups between reanalysis and ensemble members in tables
def display_fraction_match_mismatch(data_reanalysis: xr.Dataset, data_ensemble: xr.Dataset, *,
                                    indicators: Iterable[str]=["SPI", "SPEI"],
                                    drought_categories_spi:  Iterable[Iterable]=CATEGORIES_SPI[5:],
                                    drought_categories_spei: Iterable[Iterable]=CATEGORIES_SPEI[5:],
                                    ensemble_dimension: str=ENSEMBLE_DIMENSION) -> None:
    """
    Given a reanalysis and ensemble dataset, count how many ensemble members fall _at least_ into each drought category
    (e.g. SPI is below the upper limit for "severe drought")
    for pixels where the reanalysis does / doesn't meet the same threshold.
    """
    # Separate SPI and SPEI because they have different categories and thresholds
    indicators_spi, indicators_spei = [sorted([var for var in data_reanalysis.data_vars if indicator_base in var])
                                       for indicator_base in indicators]  # e.g. [SPI1, SPI12], [SPEI1, SPEI12]

    nspi, nspei = len(indicators_spi), len(indicators_spei)
    indicators_combined = indicators_spi + indicators_spei
    categories_combined = [drought_categories_spi,] * nspi + [drought_categories_spei,] * nspei  # Repeat categories so they can be looped over

    # Get labels for all unique categories in SPI and/or SPEI
    category_labels_combined = [category[2] for category in drought_categories_spei]  # Get SPEI labels
    category_labels_combined = category_labels_combined + [category[2] for category in drought_categories_spi  # Get SPI labels
                                                          if category[2] not in category_labels_combined]       # if they weren't in SPEI too

    # Loop over rows / indicators: SPI then SPEI
    match, mismatch = zip(*[fraction_match_mismatch_with_formatting(data_reanalysis, data_ensemble, indicator, categories,
                                                                    ensemble_dimension=ensemble_dimension)
                            for indicator, categories in zip(indicators_combined, categories_combined)])

    # Re-organise into dict with SPI / SPEI labels, making sure to use the same order
    match, mismatch = [pd.concat({INDEXES_NAMED[indicator]: df for indicator, df in zip(indicators_combined, data)})
                       for data in [match, mismatch]]

    # Reorder
    match    =    match.unstack(level=0).loc[[f"Reanalysis is {category_label} or drier" for category_label in category_labels_combined]]
    mismatch = mismatch.unstack(level=0).loc[[f"Reanalysis is less dry than {category_label}" for category_label in category_labels_combined]]

    # Display subsets of data and draw conclusions
    caption = ("Percentage of match-ups (pixel + timestamp) where "
               "none of the ensemble members (n = 0) agree with the reanalysis in declaring a state of drought.<br>"
               "High values indicate disagreement between the two datasets, and thus high uncertainty.")
    display_fraction_match_mismatch_with_formatting(match[["n = 0"]], caption)
    display_html("<br><br>", raw=True)  # Spacing

    caption = ("Percentage of match-ups (pixel + timestamp) where "
               "at least half the ensemble members (n ≥ 5) agree with the reanalysis in declaring a state of drought.<br>"
               "High values indicate some confidence in the drought classification.")
    display_fraction_match_mismatch_with_formatting(match[["n ≥ 5"]], caption)
    display_html("<br><br>", raw=True)  # Spacing

    caption = ("Percentage of match-ups (pixel + timestamp) where "
               "(nearly) all the ensemble members (n ≥ 9) agree with the reanalysis in declaring a state of drought.<br>"
               "High values indicate high confidence / low uncertainty in the drought classification.")
    display_fraction_match_mismatch_with_formatting(match[["n ≥ 9"]], caption)

    display_html("<br><hr><hr><br>", raw=True)  # Spacing between sections
    
    caption = ("Percentage of match-ups (pixel + timestamp) where "
               "the reanalysis does not declare a state of drought, and none of the ensemble members (n = 0) declare one either.<br>"
               "High values indicate high confidence / low uncertainty in the not-drought classification.")
    display_fraction_match_mismatch_with_formatting(mismatch[["n = 0"]], caption)
    display_html("<br><br>", raw=True)  # Spacing

    caption = ("Percentage of match-ups (pixel + timestamp) where "
               "the reanalysis does not declare a state of drought, but at least one of the ensemble members (n ≥ 1) does.<br>"
               "High values indicate low confidence in the not-drought classification, and/or potential for early warnings.")
    display_fraction_match_mismatch_with_formatting(mismatch[["n ≥ 1"]], caption)
Visualisation#

The following cell defines earthkit-plots styles for the indicators. These styles define the colour maps and colour bar ranges for each quantity.

Hide code cell source

# Helper function: Translate tabulated categories to Style
def categories_to_earthkit_style(var_categories: Iterable[Iterable]) -> Style:  # e.g. CATEGORIES_SPI
    """ From a list of categories, create an earthkit.plots Style. """
    cmap = ListedColormap([category[3] for category in var_categories[::-1]])  # Colours in reverse (ascending) order
    norm =   BoundaryNorm([category[0] for category in var_categories[-2::-1]], cmap.N, extend="both")
    style = {"cmap": cmap, "norm": norm, "extend": "both"}
    return style

# Helper functions: Generate linear colour maps up to a specified colour
def categorical_colourmap(end: str, *, start: str="#000000", N: int=11) -> LinearSegmentedColormap:
    """ Generate a linear colour map from `start` (default: black) to `end` (e.g. red) in `N` steps (default: 11). """
    return LinearSegmentedColormap.from_list("", [start, end], N=N)

def is_in_category_style(end: str, *args, **kwargs) -> Style:
    """ Generate an earthkit-plots Style for 'how many members in this category'. """
    common = {"vmin": 0, "vmax": 1, "extend": "neither", "normalize": False}
    colour = {"cmap": categorical_colourmap(end, *args, **kwargs)}
    return Style(**common, **colour)

# Styles for SPI, SPEI based on Keune+25 colours
_style_categories_spi  = categories_to_earthkit_style(CATEGORIES_SPI)
_style_categories_spei = categories_to_earthkit_style(CATEGORIES_SPEI)
_style_spi = _style_spei = {"cmap": plt.cm.RdBu, "vmin": -3, "vmax": 3, "extend": "both"}
_style_spi_spread = _style_spei_spread = {"cmap": plt.cm.cividis, "vmin": 0, "vmax": 1, "extend": "max"}

styles = {
    "SPI": Style(**_style_spi), "SPI_spread": Style(**_style_spi_spread),
    "SPEI": Style(**_style_spei), "SPEI_spread": Style(**_style_spei_spread),
    "SPI_categories": Style(**_style_categories_spi),
    "SPEI_categories": Style(**_style_categories_spei),
}

# Apply general settings
for style in styles.values():
    style.normalize = False

The following cell contains some base helper functions (e.g. displaying in Jupyter Notebook or Jupyter Book style, adding textboxes with consistent formatting, etc.):

Hide code cell source

# Visualisation: Helper functions, general
def _glue_or_show(fig, glue_label=None):
    try:
        glue(glue_label, fig, display=False)
    except TypeError:
        plt.show()
    finally:
        plt.close()

def _add_textbox_to_subplots(text: str, *axs: plt.Axes | ekp.Subplot, right=False) -> None:
    """ Add a text box to each of the specified subplots. """
    # Get the plt.Axes for each ekp.Subplot
    axs = [subplot.ax if isinstance(subplot, ekp.Subplot) else subplot for subplot in axs]

    # Set up location
    x = 0.95 if right else 0.05
    horizontalalignment = "right" if right else "left"

    # Add the text
    for ax in axs:
        ax.text(x, 0.95, text, transform=ax.transAxes,
        horizontalalignment=horizontalalignment, verticalalignment="top",
        bbox={"facecolor": "white", "edgecolor": "black", "boxstyle": "round",
              "alpha": 1})

The following functions are also base helper functions, but specific to geospatial plots. Country boundaries are downloaded from the Geographic Information System of the Commission (GISCO) through earthkit-plots, and are © EuroGeographics.

Hide code cell source

# Visualisation: Helper functions for geospatial plots
def decorate_fig(fig: ekp.Figure, *, title: Optional[str]="") -> None:
    """ Decorate an earthkit figure with land, coastlines, etc. """
    fig.land()
    fig.borders(source="gisco")
    # fig.administrative_areas(transform_first=True)  # State borders; disabled by default because slow
    fig.coastlines(color="black")
    fig.gridlines(linestyle=plt.rcParams["grid.linestyle"])
    fig.title(title)

The following functions display geospatial comparisons of the deterministic reanalysis and ensemble member mean/spread:

Hide code cell source

def plot_reanalysis_vs_ensemble(data_reanalysis: xr.Dataset, data_ensemble: xr.Dataset, indicator: str, *,
                                dates: Optional[Iterable[pd.Timestamp | str]]=None,
                                domain: Optional[AnyDomain]=None,
                                time_dimension: str="time",
                                glue_label: Optional[str]=None) -> ekp.Figure:                   
    """
    Plot data from a reanalysis and ensemble (calculating the mean and spread) side-by-side geospatially.
    Loops over the specified dates; if none are specified, then loops over all dates.
    """
    # Find dates if none were provided
    if not dates:
        try:
            dates = data_reanalysis[time_dimension]
        except KeyError:
            raise ValueError(f"No dates provided and cannot find them on {time_dimension} dimension in data_reanalysis.")
    dates = pd.to_datetime(dates)  # Convert to Pandas timestamps for formatting

    # Determine if SPI or SPEI
    spi_or_spei = "SPEI" if "E" in indicator else "SPI"
    indicator_label = INDEXES_NAMED[indicator]

    # Calculate ensemble mean and spread
    mean_ensemble, spread_ensemble = ensemble_mean_and_spread(data_ensemble)
    PANEL_LABELS = ["reanalysis", "ensemble mean", "ensemble spread",]

    # Create figure
    nrows = len(dates)
    ncols = 3
    fig = ekp.Figure(rows=nrows, columns=ncols, size=(4*ncols, 4*nrows))

    # Loop over dates and plot data
    for date in dates:
        # Select data
        reanalysis_today, mean_today, spread_today = [data.sel({time_dimension: date})
                                                      for data in [data_reanalysis, mean_ensemble, spread_ensemble]]
        # Plot data
        subplots = [fig.add_map(domain=domain) for j in range(ncols)]
        subplots[0].grid_cells(reanalysis_today, z=indicator, style=styles[f"{spi_or_spei}"])
        subplots[1].grid_cells(mean_today,       z=indicator, style=styles[f"{spi_or_spei}"])
        subplots[2].grid_cells(spread_today,     z=indicator, style=styles[f"{spi_or_spei}_spread"])

        # Label dates
        _add_textbox_to_subplots(f"{date:%B %Y}", *subplots)

    # Colour bar on last row
    for subplot, label in zip(subplots, PANEL_LABELS):
        subplot.legend(location="bottom", label=f"{indicator_label}\n({label})")

    # Labels on top row
    for subplot, label in zip(fig.subplots[:ncols], PANEL_LABELS):
        subplot.title(label)

    # Decorate figures
    domain_name = f"over {domain.name}" if domain is not None else ""
    decorate_fig(fig, title=f"ERA5–Drought {indicator_label} {domain_name}")

    # Show result
    _glue_or_show(fig.fig, glue_label)

The following functions display geospatial comparisons of the deterministic reanalysis and ensemble members in terms of SPI / SPEI categories:

Hide code cell source

def plot_drought_category_comparison(data_reanalysis: xr.Dataset, data_ensemble: xr.Dataset,
                                     indicator: str, categories: Iterable[Iterable], *,
                                     dates: Optional[Iterable[pd.Timestamp | str]]=None,
                                     drought_starting_index: int=0,
                                     domain: Optional[AnyDomain]=None,
                                     time_dimension: str="time", ensemble_dimension: str=ENSEMBLE_DIMENSION,
                                     glue_label: Optional[str]=None) -> ekp.Figure:                   
    """
    Plot drought categories from a reanalysis and ensemble side-by-side geospatially.
    For the reanalysis, simply show the indicator values with a categorical colour map.
    For the ensemble, show the # of ensemble members in each category, with one panel per category.
        `drought_starting_index` can be used to specify at which category to start the ensemble plots
        (e.g. you only want the drought ones).
    Loops over the specified dates; if none are specified, then loops over all dates.
    """
    # Find dates if none were provided
    if not dates:
        try:
            dates = data_reanalysis[time_dimension]
        except KeyError:
            raise ValueError(f"No dates provided and cannot find them on {time_dimension} dimension in data_reanalysis.")
    dates = pd.to_datetime(dates)  # Convert to Pandas timestamps for formatting

    # Determine if SPI or SPEI
    spi_or_spei = "SPEI" if "E" in indicator else "SPI"
    indicator_label = INDEXES_NAMED[indicator]
    categories_drought = categories[drought_starting_index:]  # e.g. 5: for CATEGORIES_SPI

    # Create figure
    nrows = len(dates)
    ncols = 1 + len(categories_drought)
    fig = ekp.Figure(rows=nrows, columns=ncols, size=(4*ncols, 4*nrows))

    # Loop over dates and plot data
    for date in dates:
        # Select data
        reanalysis_today, ensemble_today = [data.sel({time_dimension: date})
                                            for data in [data_reanalysis, data_ensemble]]

        # Plot reanalysis
        subplots = [fig.add_map(domain=domain) for j in range(ncols)]
        subplots[0].grid_cells(reanalysis_today, z=indicator, style=styles[f"{spi_or_spei}_categories"])

        # Loop over ensemble categories and plot data
        for (lower_lim, upper_lim, category_label, category_colour), subplot in zip(categories_drought, subplots[1:]):
            # Find ensemble members > lower lim
            in_this_category = (ensemble_today[[indicator]] < upper_lim)  # Boolean mask
            in_this_category = in_this_category.sum(dim=ensemble_dimension)  # Count
            in_this_category = in_this_category / ensemble_today.sizes[ensemble_dimension]  # Relative count
            in_this_category = in_this_category.where(ensemble_today.sel({ensemble_dimension: 1}).notnull())  # Re-apply NaN mask

            # Plot data
            subplot.grid_cells(in_this_category, z=indicator, style=is_in_category_style(category_colour))

        # Label dates
        _add_textbox_to_subplots(f"{date:%B %Y}", *subplots)

    # Colour bar on last row
    colourbar_labels = [indicator_label,] + ["fraction of ensemble members" for category in categories_drought]
    for subplot, label in zip(subplots, colourbar_labels):
        subplot.legend(location="bottom", label=label)

    # Labels on top row
    panel_labels = ["reanalysis",] + [f"{category[2]} or drier" for category in categories_drought]
    for subplot, label in zip(fig.subplots[:ncols], panel_labels):
        subplot.title(label)

    # Decorate figures
    domain_name = f"over {domain.name}" if domain is not None else ""
    decorate_fig(fig, title=f"ERA5–Drought {indicator_label} {domain_name}")

    # Show result
    _glue_or_show(fig.fig, glue_label)

2. Download data#

General setup#

This notebook uses earthkit-data to download files from the CDS. If you intend to run this notebook multiple times, it is highly recommended that you enable caching to prevent having to download the same files multiple times. If you prefer not to use earthkit, the following requests can also be used with the cdsapi module. In either case (earthkit-data or cdsapi), it is required to set up a CDS account and API key as explained on the CDS website.

In this example, we will be looking at a single drought event in one country, namely Australia. This country is defined in the following cell and can be edited when running this notebook yourself:

Hide code cell source

# Define your preferred country
country = "Australia"

# Retrieve geographical information
domain = ekp.geo.domains.Domain.from_string(country)  # Convert to earthkit Domain
request_country = domain_to_request(domain)  # Bounding box for CDS downloads: Entire domain with some padding

Identify a drought event#

We use EM-DAT [Delforge+25], the international disaster database maintained by UC Louvain’s Centre for Research on the Epidemiology of Disasters (CRED), to identify historical drought events in our chosen country (here Australia) and pick one to analyse further. EM-DAT data can be downloaded from https://public.emdat.be/ This requires registration with an account and appropriate licence. For this notebook, we assume you have downloaded data for your chosen country already. The filename can be specified below:

Hide code cell source

# Define your filename for EM-DAT data
emdat_filename = "~/emdat_australia.xlsx"  # Example: Data downloaded for Australia and saved in the home folder

# Load EM-DAT data
drought_events = pd.read_excel(emdat_filename, index_col="DisNo.")  # This might prompt you to pip install openpyxl

# Display relevant columns in notebook
drought_events[["Start Year", "Start Month", "Location", "End Year", "End Month",]]
Start Year Start Month Location End Year End Month
DisNo.
1967-9012-AUS 1967 NaN South-East 1969 NaN
1974-9100-AUS 1974 12.0 Central New South Wales 1975 1.0
1976-9126-AUS 1976 NaN Victoria, South Australia, Southern Inland New... 1976 NaN
1976-9127-AUS 1976 5.0 Southwest of Western Australia 1976 8.0
1978-9200-AUS 1978 NaN Southern Queensland, New South Wales, Victoria... 1978 NaN
1981-9188-AUS 1981 NaN Queensland, New South Wales, Victoria States 1982 NaN
1991-9697-AUS 1991 NaN Western, South 1991 NaN
1992-9543-AUS 1992 12.0 Queensland 1995 12.0
2006-9554-AUS 2006 10.0 NO DATA: all country has been selected 2006 NaN
2002-9611-AUS 2002 4.0 Queensland, New South Wales provinces 2002 11.0
2018-9258-AUS 2018 1.0 New South Wales, Victoria , Queensland 2018 8.0

Based on the table above, we choose the 2018 drought event, which has well-defined start and end dates and spans a large part of the country. In the following cell, we manually define the dates for the analysis, which can be modified when running this notebook yourself:

Hide code cell source

# Define start, middle, and end dates for analysis
date_start  = "2018-01"
date_middle = "2018-05"
date_end    = "2018-08"

# Combine into one list
dates = [date_start, date_middle, date_end,]

# Convert to datetime objects for convenience when defining requests
dates = [pd.to_datetime(date) for date in dates]

Setup for CDS downloads#

We will be downloading different types of data (drought indicators and quality flags, deterministic reanalysis and ensemble) in this notebook. CDS data requests take the form of dictionaries in Python. When making multiple requests, it is convenient to define some constants and set up template requests with default parameters. In this section, we define our template containing those parameters that are constant between datasets: the domain in time and space. This way, these are guaranteed to be consistent between downloads and only need to be changed in one place if you wish to modify the notebook for your own use case.

We want to download SPI, SPEI, and the associated quality flags for the 1- and 12-month accumulation windows. The country and timespan were defined in the previous section; the variables and accumulation periods are defined in the following cell, and can be edited when running this notebook yourself:

Hide code cell source

# Define your preferred variables (SPI and/or SPEI; both by default)
indicators = ["SPI", "SPEI"]
indicators_long = ["standardised_precipitation_index", "standardised_precipitation_evapotranspiration_index"]  # CDS request format

# Define your preferred accumulation periods, e.g. for SPI-1 and SPI-12
accumulation_periods = [1, 12,]

The CDS request template is then defined using the values specified above. Additional information to download specific data variables will be mixed into this template in the following sections.

Hide code cell source

ID_ERA5_DROUGHT = "derived-drought-historical-monthly"

# Main template: Dataset + Domain
request_era5drought_template = {
    "version": "1_0",
    "dataset_type": "consolidated_dataset",
} | request_country

# Templates for reanalysis, ensemble
request_reanalysis = request_era5drought_template | {"product_type": ["reanalysis"], }
request_ensemble   = request_era5drought_template | {"product_type": ["ensemble_members"], }

# Template for indicators
request_indicators = {"variable": indicators_long, }

# Template for accumulation periods
request_accumulation = {"accumulation_period": [f"{p}" for p in accumulation_periods], }

Download SPI and SPEI#

We start by downloading the drought indicators (SPI and SPEI) for the specified dates:

Hide code cell source

# Define requests for indicators
request_indicators_reanalysis = request_reanalysis | request_indicators | request_accumulation
request_indicators_ensemble   = request_ensemble   | request_indicators | request_accumulation

# Define requests for start/middle/end dates, so only those are downloaded
request_dates = [{"year":  [f"{date.year}",], 
                  "month": [f"{date.month:02}",],
                 } for date in dates]

# Combine requests for indicators and dates
subrequests_indicators_reanalysis = [req | request_indicators_reanalysis for req in request_dates]
subrequests_indicators_ensemble   = [req | request_indicators_ensemble   for req in request_dates]

# Download data and load into xarray
data_indicators_reanalysis = ekd.from_source("cds", ID_ERA5_DROUGHT, *subrequests_indicators_reanalysis)  # Download as field list
data_indicators_reanalysis = data_indicators_reanalysis.to_xarray(compat="equals").load()  # Convert to xarray dataset, without dask

data_indicators_ensemble   = ekd.from_source("cds", ID_ERA5_DROUGHT, *subrequests_indicators_ensemble)  # Download as field list
data_indicators_ensemble   = data_indicators_ensemble.to_xarray(compat="equals").load()  # Convert to xarray dataset, without dask

# Display in notebook
data_indicators_reanalysis

Hide code cell output

<xarray.Dataset> Size: 4MB
Dimensions:  (time: 3, lat: 181, lon: 241)
Coordinates:
  * time     (time) datetime64[ns] 24B 2018-01-01T06:00:00 ... 2018-08-01T06:...
  * lat      (lat) float64 1kB -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon      (lon) float64 2kB 103.0 103.2 103.5 103.8 ... 162.5 162.8 163.0
Data variables:
    SPEI12   (time, lat, lon) float64 1MB 0.7501 0.5129 0.5198 ... nan nan nan
    SPEI1    (time, lat, lon) float64 1MB -1.232 -1.562 -1.852 ... nan nan nan
    SPI12    (time, lat, lon) float64 1MB 0.7648 0.6042 0.6023 ... nan nan nan
    SPI1     (time, lat, lon) float64 1MB -1.185 -1.703 -2.23 ... nan nan nan
Attributes: (12/19)
    title:                   SPEI12
    description:             Drought Index: Standardized Drought Index calcul...
    Conventions:             CF-1.8
    history:                 Created 29/10/2024 08:55:40 using DRYFALL.
    institution:             European Centre for Medium-Range Weather Forecasts
    source:                  DRYFALL v1.0
    ...                      ...
    frequency:               Monthly
    contact_person:          support@ecmwf.int
    ref_publication:         Keune, J., Di Giuseppe, F., Barnard, C., Damasio...
    cds_data_catalogue_url:  https://cds.climate.copernicus.eu/datasets/deriv...
    product_type:            Reanalysis
    dataset_type:            Consolidated dataset

The ensemble members in ERA5–Drought are represented by the realization dimension:

Hide code cell source

# Display in notebook
data_indicators_ensemble

Hide code cell output

<xarray.Dataset> Size: 42MB
Dimensions:      (time: 3, realization: 10, lat: 181, lon: 241)
Coordinates:
  * time         (time) datetime64[ns] 24B 2018-01-01T06:00:00 ... 2018-08-01...
  * realization  (realization) int32 40B 1 2 3 4 5 6 7 8 9 10
  * lat          (lat) float64 1kB -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon          (lon) float64 2kB 103.0 103.2 103.5 103.8 ... 162.5 162.8 163.0
Data variables:
    SPEI12       (time, realization, lat, lon) float64 10MB 0.3351 ... nan
    SPEI1        (time, realization, lat, lon) float64 10MB -1.38 -1.507 ... nan
    SPI12        (time, realization, lat, lon) float64 10MB 0.3047 ... nan
    SPI1         (time, realization, lat, lon) float64 10MB -1.567 ... nan
Attributes: (12/21)
    CDI:                     Climate Data Interface version 1.9.10 (https://m...
    Conventions:             CF-1.8
    source:                  DRYFALL v1.0
    institution:             European Centre for Medium-Range Weather Forecasts
    title:                   SPEI12
    description:             Drought Index: Standardized Drought Index calcul...
    ...                      ...
    contact_person:          support@ecmwf.int
    CDO:                     Climate Data Operators version 1.9.10 (https://m...
    ref_publication:         Keune, J., Di Giuseppe, F., Barnard, C., Damasio...
    cds_data_catalogue_url:  https://cds.climate.copernicus.eu/datasets/deriv...
    product_type:            Ensemble Members
    dataset_type:            Consolidated dataset

Download quality flags#

ERA5–Drought includes quality flags that can be used to mask low-quality data. The first is the probability of zero precipitation (p0), which counts the number of years in the reference period (1991–2020) in which there was 0 precipitation in a given calendar month (e.g. January). This flag is only applicable to SPI, where the authors recommend masking any data with p0 > 0.1 [Keune+25]. The second is the normality α, derived using the Shapiro–Wilk test [Shapiro+65]. This α quantifies how well the computed SPI or SPEI values in the reference period (1991–2020) are described by a standard normal distribution. The mask in ERA–Drought uses a threshold of 0.05, meaning points where α < 0.05 are rejected.

The three quality flags (p0 for SPI, α for SPI, and α for SPEI) are downloaded from the CDS for the reanalysis and ensemble. Note that ERA5–Drought provides the quality flags with timestamps for 2020 (2020-01-01, 2020-02-01, etc.) to fit into the datetime format, which requires a year and day. However, the flags are applicable to the corresponding calendar month (January, February, etc.) across the entire timespan.

Note that quality flags for different accumulation periods are provided under a common variable name (pzero or significance), with references to the specific drought index and accumulation period in the filename and metadata. When downloading quality flags for multiple accumulation periods at the same time, xarray and earthkit-data may automatically merge quality flags for different accumulation periods into a single array, or overwrite one with another. To mitigate this, quality flags must be downloaded separately for each accumulation period, renamed, and merged together, rather than downloaded for all simultaneously.

Hide code cell source

# Setup: Define requests for months
calendar_months = sorted({date.month for date in dates})  # Set-comp removes duplicates
request_calendar_months = {"month": [f"{month:02}" for month in calendar_months], }

# Setup: Define requests for individual accumulation periods
requests_per_accumulation_period = {period: {"accumulation_period": [str(period)],}
                                    for period in accumulation_periods}

Hide code cell source

# Define request for p0
request_p0 = {"variable": ["probability_of_zero_precipitation_spi"], } | request_calendar_months
request_p0_reanalysis = request_reanalysis | request_p0
request_p0_ensemble   = request_ensemble   | request_p0

# Define and download per accumulation period
subrequests_p0_reanalysis = {period: (req | request_p0_reanalysis)
                             for period, req in requests_per_accumulation_period.items()}
subrequests_p0_ensemble   = {period: (req | request_p0_ensemble)
                             for period, req in requests_per_accumulation_period.items()}
data_p0_reanalysis = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                      for period, subreq in subrequests_p0_reanalysis.items()}
data_p0_ensemble   = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                      for period, subreq in subrequests_p0_ensemble.items()}

# Pre-process: Convert to xarray, handle variable names, month dimension
data_p0_reanalysis = preprocess_era5drought_qualityflag(data_p0_reanalysis, "SPI")
data_p0_ensemble   = preprocess_era5drought_qualityflag(data_p0_ensemble, "SPI")

# Load into memory without dask
data_p0_reanalysis = data_p0_reanalysis.load()
data_p0_ensemble   = data_p0_ensemble.load()

# Generate mask: Allow data where p0 < 0.1
mask_p0_reanalysis = (data_p0_reanalysis < P0_THRESHOLD)
mask_p0_ensemble   = (data_p0_ensemble   < P0_THRESHOLD)

# Display in notebook
mask_p0_ensemble

Hide code cell output

<xarray.Dataset> Size: 3MB
Dimensions:      (realization: 10, lat: 181, lon: 241, time: 3)
Coordinates:
  * realization  (realization) int32 40B 1 2 3 4 5 6 7 8 9 10
  * lat          (lat) float32 724B -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon          (lon) float32 964B 103.0 103.2 103.5 ... 162.5 162.8 163.0
  * time         (time) datetime64[ns] 24B 2020-01-01 2020-05-01 2020-08-01
Data variables:
    SPI1         (time, realization, lat, lon) bool 1MB True True ... True True
    SPI12        (time, realization, lat, lon) bool 1MB True True ... True True
Attributes: (12/19)
    title:                   Quality criteria for the derived standardized dr...
    description:             Monthly quality criteria that define the reliabi...
    Conventions:             CF-1.8
    history:                 Created 13/08/2024 12:28:53 using DRYFALL.
    institution:             European Centre for Medium-Range Weather Forecasts
    source:                  DRYFALL v1.0
    ...                      ...
    climate_start_date:      1991-01-01
    climate_end_date:        2020-12-31
    frequency:               Monthly
    contact_person:          support@ecmwf.int
    product_type:            Consolidated Dataset
    dataset_type:            Ensemble Members

Hide code cell source

# Define request for normality
request_normality = {"variable": ["test_for_normality_spi"], } | request_calendar_months
request_normality_reanalysis = request_reanalysis | request_normality
request_normality_ensemble   = request_ensemble   | request_normality

# Define and download per accumulation period
subrequests_normality_reanalysis = {period: (req | request_normality_reanalysis)
                                    for period, req in requests_per_accumulation_period.items()}
subrequests_normality_ensemble   = {period: (req | request_normality_ensemble)
                                    for period, req in requests_per_accumulation_period.items()}
mask_normality_reanalysis_spi = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                                 for period, subreq in subrequests_normality_reanalysis.items()}
mask_normality_ensemble_spi   = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                                 for period, subreq in subrequests_normality_ensemble.items()}

# Pre-process: Convert to xarray, handle variable names, month dimension
mask_normality_reanalysis_spi = preprocess_era5drought_qualityflag(mask_normality_reanalysis_spi, "SPI")
mask_normality_ensemble_spi   = preprocess_era5drought_qualityflag(mask_normality_ensemble_spi, "SPI")

# Load into memory without dask
mask_normality_reanalysis_spi = mask_normality_reanalysis_spi.load()
mask_normality_ensemble_spi   = mask_normality_ensemble_spi.load()

# Convert to boolean mask
mask_normality_reanalysis_spi = mask_normality_reanalysis_spi.astype(bool)
mask_normality_ensemble_spi   = mask_normality_ensemble_spi.astype(bool)

# Display in notebook
mask_normality_ensemble_spi

Hide code cell output

<xarray.Dataset> Size: 3MB
Dimensions:      (time: 3, realization: 10, lat: 181, lon: 241)
Coordinates:
  * realization  (realization) int32 40B 1 2 3 4 5 6 7 8 9 10
  * lat          (lat) float32 724B -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon          (lon) float32 964B 103.0 103.2 103.5 ... 162.5 162.8 163.0
  * time         (time) datetime64[ns] 24B 2020-01-01 2020-05-01 2020-08-01
Data variables:
    SPI1         (time, realization, lat, lon) bool 1MB True True ... False
    SPI12        (time, realization, lat, lon) bool 1MB True True ... False
Attributes: (12/19)
    title:                   Quality criteria for the derived standardized dr...
    description:             Monthly quality criteria that define the reliabi...
    Conventions:             CF-1.8
    history:                 Created 13/08/2024 12:28:53 using DRYFALL.
    institution:             European Centre for Medium-Range Weather Forecasts
    source:                  DRYFALL v1.0
    ...                      ...
    climate_start_date:      1991-01-01
    climate_end_date:        2020-12-31
    frequency:               Monthly
    contact_person:          support@ecmwf.int
    product_type:            Consolidated Dataset
    dataset_type:            Ensemble Members

Hide code cell source

# Define request for normality
request_normality = {"variable": ["test_for_normality_spei"], } | request_calendar_months
request_normality_reanalysis = request_reanalysis | request_normality
request_normality_ensemble   = request_ensemble   | request_normality

# Define and download per accumulation period
subrequests_normality_reanalysis = {period: (req | request_normality_reanalysis)
                                    for period, req in requests_per_accumulation_period.items()}
subrequests_normality_ensemble   = {period: (req | request_normality_ensemble)
                                    for period, req in requests_per_accumulation_period.items()}
mask_normality_reanalysis_spei = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                                  for period, subreq in subrequests_normality_reanalysis.items()}
mask_normality_ensemble_spei   = {period: ekd.from_source("cds", ID_ERA5_DROUGHT, subreq)  # Download as field lists
                                  for period, subreq in subrequests_normality_ensemble.items()}

# Pre-process: Convert to xarray, handle variable names, month dimension
mask_normality_reanalysis_spei = preprocess_era5drought_qualityflag(mask_normality_reanalysis_spei, "SPEI")
mask_normality_ensemble_spei   = preprocess_era5drought_qualityflag(mask_normality_ensemble_spei, "SPEI")

# Load into memory without dask
mask_normality_reanalysis_spei = mask_normality_reanalysis_spei.load()
mask_normality_ensemble_spei   = mask_normality_ensemble_spei.load()

# Convert to boolean mask
mask_normality_reanalysis_spei = mask_normality_reanalysis_spei.astype(bool)
mask_normality_ensemble_spei   = mask_normality_ensemble_spei.astype(bool)

# Display in notebook
mask_normality_ensemble_spei

Hide code cell output

<xarray.Dataset> Size: 3MB
Dimensions:      (time: 3, realization: 10, lat: 181, lon: 241)
Coordinates:
  * realization  (realization) int32 40B 1 2 3 4 5 6 7 8 9 10
  * lat          (lat) float32 724B -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon          (lon) float32 964B 103.0 103.2 103.5 ... 162.5 162.8 163.0
  * time         (time) datetime64[ns] 24B 2020-01-01 2020-05-01 2020-08-01
Data variables:
    SPEI1        (time, realization, lat, lon) bool 1MB True True ... False
    SPEI12       (time, realization, lat, lon) bool 1MB True True ... False
Attributes: (12/19)
    title:                   Quality criteria for the derived standardized dr...
    description:             Monthly quality criteria that define the reliabi...
    Conventions:             CF-1.8
    history:                 Created 13/08/2024 12:35:53 using DRYFALL.
    institution:             European Centre for Medium-Range Weather Forecasts
    source:                  DRYFALL v1.0
    ...                      ...
    climate_start_date:      1991-01-01
    climate_end_date:        2020-12-31
    frequency:               Monthly
    contact_person:          support@ecmwf.int
    product_type:            Consolidated Dataset
    dataset_type:            Ensemble Members

Monitoring drought across a wide area requires robust statistics. Therefore, we mask all ensemble members if any are flagged (for a particular location and time) to avoid biasing the statistics – for example, the p0 flag is more likely to be triggered by drier ensemble members, so only masking those could introduce a wet bias in the result.

Hide code cell source

# Combine masks for ensembles
mask_p0_ensemble             = combine_ensemble_mask(mask_p0_ensemble)
mask_normality_ensemble_spi  = combine_ensemble_mask(mask_normality_ensemble_spi)
mask_normality_ensemble_spei = combine_ensemble_mask(mask_normality_ensemble_spei)

# Display result
mask_p0_ensemble

Hide code cell output

<xarray.Dataset> Size: 263kB
Dimensions:  (lat: 181, lon: 241, time: 3)
Coordinates:
  * lat      (lat) float32 724B -4.0 -4.25 -4.5 -4.75 ... -48.5 -48.75 -49.0
  * lon      (lon) float32 964B 103.0 103.2 103.5 103.8 ... 162.5 162.8 163.0
  * time     (time) datetime64[ns] 24B 2020-01-01 2020-05-01 2020-08-01
Data variables:
    SPI1     (time, lat, lon) bool 131kB True True True True ... True True True
    SPI12    (time, lat, lon) bool 131kB True True True True ... True True True

3. Drought indicators#

We examine the distribution of SPI and SPEI to get an intuition for circumstances in this particular drought event and the spread of the ensemble. First, this is done without applying quality flags:

Hide code cell source

# Loop over indicators (SPI1, SPI12, etc.)
for indicator in data_indicators_reanalysis.data_vars:
    # Set up label for Jupyter-book; can be ignored when running your own analysis
    glue_label = f"indicator_derived-drought-historical-monthly_uncertainty_q02_fig-map-{indicator.lower()}-noflags"

    # Plot time series
    plot_reanalysis_vs_ensemble(data_indicators_reanalysis, data_indicators_ensemble, indicator,
                                domain=domain,
                                glue_label=glue_label)
../_images/67c1e87f38ce30ed6ee7fd725100775c4fc3e8f4f7a7d431866b85e0c7267f0f.png

Fig. 6.1.1 SPI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags have not been applied.#

../_images/6367752a1200757597562ec75c358ca2a121a1789d164c128fc44f89c843dbe2.png

Fig. 6.1.2 SPI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags have not been applied.#

../_images/48a8e8337573b88d96e50f197c7881111270c89b580e0a14245ff47a1c7ffeb0.png

Fig. 6.1.3 SPEI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags have not been applied.#

../_images/127899a03736ce71bc820d2c493f19a2df788f93e31d0c58a6493e78b3ee7bea.png

Fig. 6.1.4 SPEI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags have not been applied.#

The geospatial distributions for SPI (Figures 6.1.16.1.2) and SPEI (Figures 6.1.36.1.4) both show good agreement between the deterministic reanalysis and the ensemble mean. Both datasets generally show the same patterns with the same magnitudes, although SPI-1 and SPI-12 show differences in some areas, such as the central northern coastal area (Arnhem Land and surroundings). The most striking difference is that the ensemble mean appears much smoother, since ERA5’s ensemble members are run on a coarser grid than the deterministic reanalysis and subsequently interpolated down to the same grid in ERA5–Drought. The averaging over 10 members may cause additional smoothing of random noise.

The ensemble spread in SPI-1 and SPI-12 is very large (≥1) in some areas, particularly in January (local summer). Some of these areas correspond to noisy-looking areas in the reanalysis and ensemble mean. This pattern does not appear at all in SPEI-1 and SPEI-12, which have a spatially smooth and much smaller ensemble spread. The difference is likely due to the two indicators using different underpinning distributions (gamma vs. generalised logistic) and data (precipitation vs. precipitation – potential evaporation). This is explored further in a forthcoming notebook on the reproducibility of the ERA5–Drought dataset.

Applying the provided quality flags removes most of the noisy-appearing and high-spread areas. This includes most of the continent for SPI-1 (Figure 6.1.5) because SPI-1 is the most sensitive to dry months – which occur often in inland Australia. More data are retained for SPI-12 and SPEI (Figures 6.1.66.1.8), although significant gaps remain. This is particularly true for the ensemble mean and spread, which we have masked if any ensemble member is flagged at a particular pixel and timestamp to maintain statistical robustness.

In the retained areas, the ensemble spread is generally small, with the median over the entire domain and all three dates being ≤0.24 for SPI and ≤0.15 for SPEI. However, what matters most for drought monitoring is not the uncertainty in SPI or SPEI values themselves but in the corresponding drought levels. This will be explored in the next section.

Hide code cell source

# Setup: Find SPI / SPEI data variables
spi_variables  = [var for var in data_indicators_ensemble if "SPI"  in var]
spei_variables = [var for var in data_indicators_ensemble if "SPEI" in var]

# Apply quality flags to SPI and SPEI (updating original dataset)
data_indicators_reanalysis[spi_variables]  = apply_quality_flags(data_indicators_reanalysis,
                                                                 mask_p0_reanalysis, mask_normality_reanalysis_spi)
data_indicators_reanalysis[spei_variables] = apply_quality_flags(data_indicators_reanalysis,
                                                                 mask_normality_reanalysis_spei)

data_indicators_ensemble[spi_variables]  = apply_quality_flags(data_indicators_ensemble,
                                                               mask_p0_ensemble, mask_normality_ensemble_spi)
data_indicators_ensemble[spei_variables] = apply_quality_flags(data_indicators_ensemble,
                                                               mask_normality_ensemble_spei)

# Calculate ensemble spread in retained pixels
ensemble_mean, ensemble_spread = ensemble_mean_and_spread(data_indicators_ensemble)
print("Median spread over all pixels and timestamps:")
ensemble_spread.median().to_pandas()

Hide code cell output

Median spread over all pixels and timestamps:
SPEI12    0.148448
SPEI1     0.133076
SPI12     0.203023
SPI1      0.235265
dtype: float64

Hide code cell source

# Loop over indicators (SPI1, SPI12, etc.)
for indicator in data_indicators_reanalysis.data_vars:
    # Set up label for Jupyter-book; can be ignored when running your own analysis
    glue_label = f"indicator_derived-drought-historical-monthly_uncertainty_q02_fig-map-{indicator.lower()}"

    # Plot time series
    plot_reanalysis_vs_ensemble(data_indicators_reanalysis, data_indicators_ensemble, indicator,
                                domain=domain,
                                glue_label=glue_label)
../_images/92cbd2c989a2543b66b52a501f83585d94d9456850451088dc10de3fced5a4d9.png

Fig. 6.1.5 SPI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags for p0 and normality have been applied.#

../_images/5c4c209ec2417ffe18f636caaea175aff53d3b36d5133e67acc9f9cd072e8920.png

Fig. 6.1.6 SPI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags for p0 and normality have been applied.#

../_images/e53a5d17b02badd30bfb068ec9b7b7ef22070cd2ccb1c60ac23b0bdfb2d800da.png

Fig. 6.1.7 SPEI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags for normality have been applied.#

../_images/8f365f353e1c90bbf29d27f87b2b955a38a1b977ba37adb9ef5345727c2ca58a.png

Fig. 6.1.8 SPEI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) and the per-pixel mean (middle) and spread (right) of the ensemble members. Quality flags for normality have been applied.#

4. Drought classification#

Because SPI and SPEI are defined in probabilistic terms, there is a relatively simple mapping from their values to corresponding drought categories like “mild drought” or “extreme drought” [McKee+93]. Drought monitoring generally focuses on these categories. For example, governmental mechanisms for drought relief may be triggered when SPI or SPEI dips below the threshold for one of these pre-defined categories [Steinemann+06]. The thresholds used in this quality assessment follow [Keune+25].

Here, we assess the uncertainty in declaring drought levels based on how many of the ensemble members match the reanalysis. For example, if the reanalysis suggests a severe or worse drought (SPI < –1.5), and all 10 ensemble members say the same, one can be reasonably confident in declaring a severe drought. However, if only 2 ensemble members say the same, with the other 8 suggesting only moderate (–1.0 > SPI > –1.5) or even no drought (SPI > –1.0), then the uncertainty is too large to make a definitive statement. The opposite case is also interesting – if the reanalysis says there is no drought but several ensemble members say there is, this could serve as an early warning or as the basis for a more nuanced assessment of the situation on the ground.

Hide code cell source

# Compute and display number of matches and mismatches
display_fraction_match_mismatch(data_indicators_reanalysis, data_indicators_ensemble)

Hide code cell output

Percentage of match-ups (pixel + timestamp) where none of the ensemble members (n = 0) agree with the reanalysis in declaring a state of drought.
High values indicate disagreement between the two datasets, and thus high uncertainty.
  n = 0
  SPI-1 SPI-12 SPEI-1 SPEI-12
Reanalysis is Mildly dry or drier nan nan 1.6 1.5
Reanalysis is Moderately dry or drier 7.2 3.1 2.4 3.3
Reanalysis is Severely dry or drier 11.3 3.9 3.5 3.0
Reanalysis is Extremely dry or drier 15.9 3.8 6.0 10.3


Percentage of match-ups (pixel + timestamp) where at least half the ensemble members (n ≥ 5) agree with the reanalysis in declaring a state of drought.
High values indicate some confidence in the drought classification.
  n ≥ 5
  SPI-1 SPI-12 SPEI-1 SPEI-12
Reanalysis is Mildly dry or drier nan nan 94.3 91.7
Reanalysis is Moderately dry or drier 76.9 84.5 92.7 90.1
Reanalysis is Severely dry or drier 73.8 85.8 84.8 89.6
Reanalysis is Extremely dry or drier 73.2 72.9 90.0 44.8


Percentage of match-ups (pixel + timestamp) where (nearly) all the ensemble members (n ≥ 9) agree with the reanalysis in declaring a state of drought.
High values indicate high confidence / low uncertainty in the drought classification.
  n ≥ 9
  SPI-1 SPI-12 SPEI-1 SPEI-12
Reanalysis is Mildly dry or drier nan nan 86.4 79.5
Reanalysis is Moderately dry or drier 54.7 67.3 82.6 80.9
Reanalysis is Severely dry or drier 56.5 69.5 64.9 71.0
Reanalysis is Extremely dry or drier 51.2 36.2 66.7 9.2




Percentage of match-ups (pixel + timestamp) where the reanalysis does not declare a state of drought, and none of the ensemble members (n = 0) declare one either.
High values indicate high confidence / low uncertainty in the not-drought classification.
  n = 0
  SPI-1 SPI-12 SPEI-1 SPEI-12
Reanalysis is less dry than Mildly dry nan nan 87.2 82.6
Reanalysis is less dry than Moderately dry 90.7 85.4 90.5 89.8
Reanalysis is less dry than Severely dry 93.6 93.0 92.3 94.4
Reanalysis is less dry than Extremely dry 96.4 94.5 99.4 99.3


Percentage of match-ups (pixel + timestamp) where the reanalysis does not declare a state of drought, but at least one of the ensemble members (n ≥ 1) does.
High values indicate low confidence in the not-drought classification, and/or potential for early warnings.
  n ≥ 1
  SPI-1 SPI-12 SPEI-1 SPEI-12
Reanalysis is less dry than Mildly dry nan nan 12.8 17.4
Reanalysis is less dry than Moderately dry 9.3 14.6 9.5 10.2
Reanalysis is less dry than Severely dry 6.4 7.0 7.7 5.6
Reanalysis is less dry than Extremely dry 3.6 5.5 0.6 0.7

Most of the time (73–94% of match-ups), at least half of the ensemble members (n ≥ 5) agree with the reanalysis in declaring a drought level. This is particularly true for SPEI (≥85%), matching the smaller spread seen in the previous section. “Extremely dry or drier” SPEI-12 is an outlier, but this is likely a fluke caused by the small number of pixels falling into said category. It is rare (1–16% of match-ups) for none of the ensemble members (n = 0) to match the drought level in the reanalysis, and correspondingly it is common for none of the ensemble members to declare a drought when the reanalysis does not (e.g. out of the pixels and timestamps where SPI-1 is not moderately dry or drier, only 9% have one or more ensemble members that are moderately dry or drier). Full or near-full agreement (n ≥ 9 or 10) occurs in around half to three quarters of match-ups. In conclusion, the drought indicators in ERA5–Drought can be used with confidence in drought monitoring, and the included ensemble can be used to measure said confidence.

There are two main causes for mismatches between the ensemble and reanalysis, or equivalently for uncertainty in the declared drought levels. First, the ensemble propagates the uncertainty in the underpinning ERA5 data. Second, because the ensemble is generated at a coarser resolution than the deterministic reanalysis and then downscaled, pixels in the ensemble represent conditions in a wider spatial area. Both factors particularly affect the edges of drought areas, where the uncertainty in ERA5 straddles the threshold of a drought level and the ensemble combines drier and less dry areas. Indeed, this pattern can be seen in the geospatial distributions of matches and mismatches (Figures 6.1.96.1.12). Other methodological factors (discussed in the introduction) likely contribute further to the uncertainty.

The ensemble also provides new opportunities for more nuanced monitoring of drought, compared to a binary classification based on the deterministic reanalysis alone. As suggested above, additional triggers could be set based on the number of ensemble members that pass an SPI or SPEI threshold. The geospatial distributions (Figures 6.1.96.1.12) show that such an approach would capture a wider and smoother area of potential drought, which can then be targeted for follow-up measurements or early warnings.

Hide code cell source

# Loop over indicators (SPI1, SPI12, etc.)
for indicator in data_indicators_reanalysis.data_vars:
    # Set up label for Jupyter-book; can be ignored when running your own analysis
    glue_label = f"indicator_derived-drought-historical-monthly_uncertainty_q02_fig-map-{indicator.lower()}-categories"

    # Set up SPI/SPEI categories
    # Note that SPI and SPEI have different numbers of categories, the dry ones just happen to both start at 5
    categories = CATEGORIES_SPEI if "SPEI" in indicator else CATEGORIES_SPI
    drought_starting_index = 5 if "SPEI" in indicator else 5

    # Plot time series
    plot_drought_category_comparison(data_indicators_reanalysis, data_indicators_ensemble, indicator, categories,
                                     drought_starting_index=drought_starting_index,
                                     domain=domain,
                                     glue_label=glue_label)
../_images/d83d806d5d8b812b12c64db42eece3c2fd93cc82fbf8f321995d4b7225fb4ca6.png

Fig. 6.1.9 SPI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) categorised following [Keune+25] and the fraction of ensemble members (out of 10) per pixel that fall into moderate-to-extreme drought categories. Quality flags for p0 and normality have been applied.#

../_images/5aa3bf881b0e4db62a690caeb0abc34d06f25c7b33e5bd02e031d682403885e7.png

Fig. 6.1.10 SPI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) categorised following [Keune+25] and the fraction of ensemble members (out of 10) per pixel that fall into moderate-to-extreme drought categories. Quality flags for p0 and normality have been applied.#

../_images/9b191482a7b9d1dbf01f94b3644e05bf29468f6007b198ef9d8ddfe79c53b118.png

Fig. 6.1.11 SPEI-1 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) categorised following [Keune+25] and the fraction of ensemble members (out of 10) per pixel that fall into moderate-to-extreme drought categories. Quality flags for normality have been applied.#

../_images/b818220c3accd2604f9bcfd04cb7c88dcad01e392f4a5f418e09433d42edadd4.png

Fig. 6.1.12 SPEI-12 over Australia at the start (top), middle, and end (bottom) of the selected drought event, showing the deterministic reanalysis (left) categorised following [Keune+25] and the fraction of ensemble members (out of 10) per pixel that fall into moderate-to-extreme drought categories. Quality flags for normality have been applied.#

ℹ️ If you want to know more#

Key resources#

The CDS catalogue entries for the data used were:

Code libraries used:

More about drought indicators:

More about drought monitoring:

More about ERA5:

More about uncertainty in ensemble reanalyses:

References#

[Delforge+25] D. Delforge et al., ‘EM-DAT: the Emergency Events Database’, International Journal of Disaster Risk Reduction, vol. 124, p. 105509, Jun. 2025, doi: 10.1016/j.ijdrr.2025.105509.

[ECIU+25] Energy & Climate Intelligence Unit, ‘Estimated financial losses faced by UK farmers due dry weather impacts on key arable crops’, Energy & Climate Intelligence Unit, London, United Kingdom, Dec. 2025.

[Evenflow+24] Evenflow, ‘The value generated by ERA5’, Copernicus Climate Change Service (C3S), Bonn, Germany, Dec. 2024. [Online]. Available: https://climate.copernicus.eu/sites/default/files/2024-12/Value-generated-by-ERA5-full-report.pdf

[Franch-Pardo+25] I. Franch-Pardo, P. A. F. Puig, and A. Cerdà, ‘Geospatial Technologies in Crisis Response: Analyzing the 2024 Floods in Valencia, Spain’, European Journal of Geography, vol. 16, no. 2, pp. 286–297, Aug. 2025, doi: 10.48088/ejg.i.fra.16.2.286.297.

[Hersbach+20] H. Hersbach et al., ‘The ERA5 global reanalysis’, Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, May 2020, doi: 10.1002/qj.3803.

[IPCC+23] IPCC, ‘Climate Change 2023: Synthesis Report. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change’, Intergovernmental Panel on Climate Change (IPCC), Geneva, Switzerland, Jul. 2023. doi: 10.59327/IPCC/AR6-9789291691647.

[Isaksen+10] L. Isaksen et al., ‘Ensemble of data assimilations at ECMWF’, European Centre for Medium-Range Weather Forecasts, Reading, UK, 636, Dec. 2010. doi: 10.21957/obke4k60.

[JCGM+08] JCGM, ‘Evaluation of measurement data — Supplement 1 to the “Guide to the expression of uncertainty in measurement” — Propagation of distributions using a Monte Carlo method’, Joint Committee for Guides in Metrology, Sevres, Paris, France, 101:2008, 2008. doi: 10.59161/JCGM101-2008.

[Keune+25] J. Keune, F. Di Giuseppe, C. Barnard, E. Damasio da Costa, and F. Wetterhall, ‘ERA5–Drought: Global drought indices based on ECMWF reanalysis’, Scientific Data, vol. 12, p. 616, Apr. 2025, doi: 10.1038/s41597-025-04896-y.

[McKee+93] T. B. McKee, N. J. Doesken, and J. Kleist, ‘The relationship of drought frequency and duration to time scales’, in Eighth Conference on Applied Climatology, Anaheim, California, USA, Jan. 1993.

[Shapiro+65] S. S. Shapiro and M. B. Wilk, ‘An analysis of variance test for normality (complete samples)’, Biometrika, vol. 52, no. 3–4, pp. 591–611, Dec. 1965, doi: 10.1093/biomet/52.3-4.591.

[Soci+24] C. Soci et al., ‘The ERA5 global reanalysis from 1940 to 2022’, Quarterly Journal of the Royal Meteorological Society, vol. 150, no. 764, pp. 4014–4048, Jul. 2024, doi: 10.1002/qj.4803.

[Steinemann+06] A. C. Steinemann and L. F. N. Cavalcanti, ‘Developing Multiple Indicators and Triggers for Drought Plans’, Journal of Water Resources Planning and Management, vol. 132, no. 3, pp. 164–174, May 2006, doi: 10.1061/(ASCE)0733-9496(2006)132:3(164).

[UNICEF+24] UNICEF, ‘Latin America and Caribbean Region Flash Update No. 2 (Climate-related crisis in the Amazon Region)’, UNICEF, Nov. 2024.

[Vicente-Serrano+10] S. M. Vicente-Serrano, S. Beguería, and J. I. López-Moreno, ‘A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index’, Journal of Climate, vol. 23, no. 7, pp. 1696–1718, Apr. 2010, doi: 10.1175/2009JCLI2909.1.