Reliability of seasonal forecasts

4.1. Reliability of seasonal forecasts#

Production date: 07-06-2024

Produced by: Johannes Langvatn (METNorway)

🌍 Use case: Initial assessment of the reliability of forecast probabilities by a climate services provider#

❓ Quality assessment question#

How reliable are the probabilities of the seasonal forecasts for different regions and parameters of interest

Seasonal forecasts are inherently probabilistic and should be interpreted and evaluated as such. Reliability, a key attribute of a forecast system, measures how closely the conditional mean of the observations aligns with the corresponding forecast. This notebook provides estimates of forecast reliability. It should be noted that reliability is only one of several attributes measuring the goodness of a forecast system Murphy, 1993 [1].

📢 Quality assessment statement#

These are the key outcomes of this assessment

The reliability of the 1-month, 2-meter air temperature forecasts for June, July, and August in the climatological upper tercile varies across different regions.
Reliability diagrams are useful tools for assessing the quality of probabilistic forecasts.
Reliability is just one of several metrics used to evaluate a forecast, and should be considered alongside other attributes such as sharpness, discrimination, and resolution.

📋 Methodology#

This notebook follows the approach suggested by Weisheimer and Palmer (2014) [2]. which was later applied and slightly modified by Manzanas et al. (2022) [3]. Reliability diagrams are constructed for forecasts of 2-meter air temperature reaching the climatological upper tercile, based on SEAS5 (ECMWF) single-level seasonal forecasts. These are then compared with ERA5 single-level data, which is considered the reference or ‘truth. There are several methods for constructing reliability diagrams. Since the primary focus is on verifying probabilities for a specific region, it is important to assess the actual forecast. This approach results in a relatively small data sample (e.g., 24 years of hindcasts x 3 months per season), meaning the results may not be very robust. To address this, Weisheimer and Palmer (2014) [2] suggested calculating a linear regression of the reliability line, weighted by the sample size in each bin. The uncertainty of the slope of the reliability line was further estimated using bootstrapping. Based on the characteristics of the reliability line and its associated uncertainty, the forecasts were categorized into five reliability categories: “perfect,” “still very useful for decision-making,” “marginally useful,” “not useful,” and “dangerously useless” (see below for more details and/or Weisheimer and Palmer (2014) [2]

This notebook calculates, presents, and discusses the reliability diagrams for a specific region: northern Europe. For this region, an alternative method of constructing the reliability diagram, similar to the approach shown on the ECMWF web pages, is also presented and compared. Finally, the reliability categories for a range of regions worldwide are calculated and summarized in a table.

1. Choose data to use and setup code:

Choose a selection of forecast systems and versions, hindcast period (normally 1993-2016), regions and parametre to compute reliability for.
Choose an example region for computing the reliability without computing the area average

2. ERA5 data retrieval and area averages:

Retrieve reanalysis-data for the selected parameters above, from the data catalogue ”ERA5 monthly averaged data on single levels from 1940 to present”
Compute the 2 m temperature anomaly by computing the climatology subtracting climatology from the reanalysis.
Compute the spatial mean of 2 m temperature anomaly for each selected region.

3. Seasonal hindcast data retrieval and area average:

Retrieve hindcast data for all ensemble members for the selected parameters above, from the data catalogue ”Seasonal forecast monthly statistics on single levels”
Compute the 2 m temperature anomaly by computing the climatology subtracting climatology from the hindcast.
Compute the spatial mean of 2 m temperature anomaly for each selected region.

4. Plot and describe results:

The forecasted frequency of 2 m temperature anomaly being warmer than climatology is calculated
The observation if the month was warmer in the reanalysis than the reanalysed climatology is noted
The reliability is then computed by bootstrapping from the pool of forecasted frequency and their respective observations
This is then plotted as confidence intervals of (12.5%, 87.5%) in the reliability plots
The reliability is also computed for the example region

📈 Analysis and results#

1.Choose the data to use and setup code#

First we import the needed code packages, and set the seaborn plot-style for matplotlib to use for plotting

In this section, the customisable options of the notebook are set. These variables consists of:

Model forecast system
- forecast centre
- system version
Time
- first year of the model hindcast
- last year of the model hindcast
- forecast month
- number of months leadtime for hindcast
Regions of interest
- number of SREX region (see SREX in the regionmask-module for choices)
Weather parametre
- name of variable in grib-file
- name of variable in CADS-api
Download parameters
- chunk size of the data
- number of concurrent request for parallel download
Combination of originating centre and model system (Operational forecast model systems per March 2024):
- centre = “ecmwf”, system = “51”
- centre = “ukmo”, system = “602”
- centre = “meteo_france”, system = “8”
- centre = “dwd”, system = “21”
- centre = “cmcc”, system = “35”
- centre = “ncep”, system = “2”
- centre = “jma”, system = “3”
- centre = “eccc”, system = “2”
- centre = “eccc”, system = “3”

Define functions needed later to find correct bins for the observations, computing the reliability, and plotting the results