Assessing the impact of spatial scale and temporal trends on seasonal forecast quality

4.3. Assessing the impact of spatial scale and temporal trends on seasonal forecast quality#

Production date: 30.04.2025

Produced by: Johannes Langvatn (METNorway), Johanna Tjernström (METNorway)

🌍 Use case: Using seasonal forecasts for regional climate monitoring and prediction#

❓ Quality assessment question#

How does the quality of seasonal forecasts change when looking at different sized areas? What are the associated limitations?
How do long term temporal trends impact the quality of seasonal forecasts?

The effectiveness of seasonal forecasts in predicting the development of key climate variables—such as temperature, precipitation and sea surface temperatures (SSTs)—varies considerably by variable, region, season, lead time and the state of large-scale climate drivers like ENSO (El Niño–Southern Oscillation). This notebook investigates how a potential trend in the data (e.g. as discussed in Greuell et al. (2019) [1]) along with selection of regions at various spatial scales (e.g. as discussed in Prodhomme et al. (2021) [2], Gubler et al. (2020) [3] Quaglia et al. (2021) [4]) affect forecasting quality. For the purpose of this assessment, seasonal forecast quality is measured by the temporal correlation of the ensemble mean with ERA5 reanalysis.

📢 Quality assessment statement#

These are the key outcomes of this assessment

The forecast quality (ensemble mean correlation) of t2m varies depending on the selected spatial scale and the choice of the regions or locations.
- There is a clear drop in forecast quality when moving from the global scale to continental scale (Europe), to a national scale (Germany), and finally to a city scale (Bonn).
- This is not the case when comparing the global scale, to the Pacific and NINO region, where the forecast quality remains largely constant, or even better, the smaller the selected region.
- The regional mean for NINO3.4 is better correlated to the reanalysis than the global mean.
- The nearest grid cell to Bonn (an example city in Germany) has low correlation and most of the quality can be attributed to the underlying trend in the dataset.
- In contrast, the nearest grid cell to Addis Ababa (an example city in Eastern Africa) has some correlation which is retained even after detrending.
For regions and locations which have a visible trend in t2m, the trend inflates the correlation of t2m compared to reanalysis.
- For Europe, most of the visible correlation can be attributed to the underlying trend.
- This contrasts with the region of greater horn of Africa, where less of the visible correlation can be attributed to the underlying trend.
Users are generally advised to avoid selecting small regions or single grid cells when using seasonal forecasts as they are only expected to be skillful in predicting large scale variations in monthly or seasonal deviations. Selecting larger regions based on a common climatology generally results in increased robustness and signal to noise.

📋 Methodology#

This notebook provides an assessment of the quality of seasonal monthly forecast temperature anomalies derived from the monthly means (https://cds.climate.copernicus.eu/datasets/seasonal-monthly-single-levels) through comparison with the ERA5 reanalysis (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means). The hindcast anomalies are calculated from the monthly means as the anomalies catalogue entry only provides the real time forecasts. Data were accessed from the CDS.

Using this data the anomaly was calculated for the two meter temperature (t2m), against the period 1993-2024, and was detrended assuming a linear trend. Note that this period contains both hindcast and forecast, which are not entirely equivalent with respect to initialization, and contain 25 and 51 ensemble members respectively. A dataset was then generated containing both the original anomaly and the detrended. The analysis was then carried out globally and for three regions chosen based on effects of trend on the correlation.

First, the global case was considered both including and excluding trend, plotting anomaly for the ensemble members, ensemble mean, and ERA5. The correlation was also calculated.

To further consider the correlation, maps were generated to more clearly view the spatial variations of correlation in anomaly (as can seen on the C3S verification page) between the seasonal forecast and ERA5. Based on these maps, a set of regions were selected for further study, based on the magnitude of the correlation. Plots were then generated for nested domains. Three spatial scales were considered, a larger, continental scale, a smaller country scale and a much smaller city scale (a single grid box in the 1x1 degree grid). Data were selected for each domain based on a lat/lon box.

The resulting plots, displaying the ensemble members, ensemble mean, and ERA5, aim to visualise the variation in quality based on the selected regions of interest. This was also done for the detrended anomaly, and correlation was also calculated for each of the domains.

1. Choose data to use and setup code

Choose a selection of forecast systems and model versions, hindcast period (normally 1993-2016 to align with the C3S common hindcast period), forecast and leadtime months
- Ensure that the ERA5 and Seasonal forecast data are regridded to the same grid by using the grid-keyword to the cdsapi.
Compute the anomaly of the forecast and reanalysis data
The data is detrended assuming a linear trend in the data from start year to end year
Both the original and detrended data is saved for the following analysis

2. Comparing global correlation

Compute the ensemble mean of the forecasted anomaly
Plot the correlation of forecast and reanalysis data, before and after performing the detrending

3. Map of correlation for detrended data

Correlation is plotted for each grid point on a map
Based on this, regions are selected to investigate further; Addis Ababa, Bonn, NINO3.4

4. Plot and describe results

Plots are generated for the regions chosen in the previous section

Key limitations:

This assessment primarily considers the correlation between the ensemble mean and ERA5, while the full ensemble spread is visualized in some of the plots, the maps of correlation only use the ensemble mean.
The method used to detrend the data is very simple, and thus may impact the quality of the detrended data.
The usage of ERA5 as ground truth, while convenient, may be considered a limitation as it is a reanalysis and not observation.
The region plots use data at scales that may consider only one or a few grid cells, this is generally not a large enough domain to expect skillful prediction of large scale variations in monthly or seasonal deviations.
The assessment in this notebook only makes use of one forecasting system, considering one lead time. While the code should work for other models, and lead times this has not been shown explicitly in this assessment. Therefore, the results outlined here are specific to this forecasting system, period and lead time, further study would need to be conducted to see if the conclusions are general.

📈 Analysis and results#

1. Choose data to use and setup code#

This section contains the setup and data processing needed for performing the analysis.

Import external code libraries needed.

Define functions to be used throughout the notebook.

Show code cell source

Hide code cell source

def compute_anomaly(obj):
    climatology = obj.mean({"realization"} & set(obj.dims))
    climatology = diagnostics.time_weighted_mean(climatology, weights=False)
    return obj - climatology

def detrend(obj):
    trend = xr.polyval(obj["time"], obj.polyfit("time", deg=1).polyfit_coefficients)
    return obj - trend

def compute_monthly_anomaly(ds):
    (da,) = ds.data_vars.values()
    with xr.set_options(keep_attrs=True):
        da = da.groupby("time.month").map(compute_anomaly)
        da_detrend = da.groupby("time.month").map(detrend)
    da = xr.concat(
        [da.expand_dims(detrend=[False]), da_detrend.expand_dims(detrend=[True])],
        "detrend",
    )
    da.encoding["chunksizes"] = tuple(
        1 if dim in ("realization", "detrend") else size
        for dim, size in da.sizes.items()
    )
    return da.to_dataset()

def plot_func(dict_with_leadtime_keys, list_of_region_keys, fignums, plot_scale=3, time_range = None):
    if time_range is None:
        time_range = dict_with_leadtime_keys.keys()
    for detrend in [False, True]:
        for leadtime in time_range:
            region_dict = {key : master_dict[leadtime][key] for key in list_of_region_keys}
            
            num_plots = len(list_of_region_keys)
            num_cols = 4
            num_rows = num_plots // num_cols 
            if num_plots % num_cols != 0:
                num_rows += 1
            
            fig,axs = plt.subplots(num_rows,num_cols, figsize=(17,4))
            dtrnd = "(Detrended)" if detrend else ""
            fig.suptitle(f"Figure {fignums[0]}: Different regions - t2m "+ dtrnd +f"\n{month_string}")
            fignums.pop(0)
            count_row = 0
            count_col = 0
            for region_name, dictionary in region_dict.items(): 
                data = dictionary["data_sf"]
                lat = dictionary["lat"]
                lon = dictionary["lon"]
                da_reanalysis_mean = dictionary["data_era5"]
                mean_ens = dictionary["mean_ens"]
                if num_cols > 1 and num_rows > 1:
                    ax = axs[count_row,count_col]
                elif num_plots == 1:
                    ax = axs
                else:
                    ax = axs[count_col]
                
                corr = float(xr.corr(mean_ens.sel(detrend=detrend), da_reanalysis_mean.sel(detrend=detrend)))
                data.sel(detrend=detrend).plot.scatter(x="time",hue_style="bo", ax = ax, label ="Realization")
                mean_ens.sel(detrend=detrend).plot.scatter(x="time", color = "r", marker="o", ax = ax, label = "Realization Mean")
                da_reanalysis_mean.sel(detrend=detrend).plot.scatter(x="time", color = "r", marker="x", ax = ax, label = "ERA5")
                ax.set_title(region_name + f" Corr = {corr:.3}")
                with mpl.rc_context({"legend.fontsize": "small", "legend.framealpha" : 0.3}):
                    ax.legend()
                count_col += 1
                if count_col >= num_cols:
                    count_row += 1
                    count_col = 0
            plt.tight_layout()
            plt.show()

def plot_regions(region_dict,regions):
    fig = plt.figure(figsize=(10, 5))
    central = (region_dict[regions[0]]["lon"].start + region_dict[regions[0]]["lon"].stop)/2
    ax = fig.add_subplot(projection=ccrs.PlateCarree(central_longitude=central))
    
    # make the map global rather than have it zoom in to
    # the extents of any plotted data
    ax.set_extent([region_dict[regions[0]]["lon"].start,region_dict[regions[0]]["lon"].stop,
    region_dict[regions[0]]["lat"].start,region_dict[regions[0]]["lat"].stop])
    
    ax.coastlines()
    width = region_dict[regions[1]]["lon"].stop - region_dict[regions[1]]["lon"].start
    height = region_dict[regions[1]]["lat"].stop - region_dict[regions[1]]["lat"].start
    
    ax.add_patch(mpatches.Rectangle(xy=[region_dict[regions[1]]["lon"].start, region_dict[regions[1]]["lat"].start],
                                    width=width, height=height,
                                    facecolor='none', edgecolor='k',lw = 2,
                                    transform=ccrs.PlateCarree()))
    
    ax.plot(region_dict[regions[2]]["lon"].start,region_dict[regions[2]]["lat"].start,marker='o',transform=ccrs.PlateCarree())
    ax.set_title(f"Shows {regions[1]} [box] and {regions[2]} [blue circle] in the {regions[0]} region")
    plt.show()

This notebook uses the t2m monthly means from one forecasting system, in this case from ECMWF (system code 51), for a forecast produced in May for leadtime months June, July and August. This was requested for both forecast and hindcast periods, thus there is a difference in the number of ensemble members before and after 2016.

2. Comparing global correlation#

Compute the ensemble mean of the seasonal forecast anomaly data.

Plot the original anomaly data for each ensemble member, the ensemble mean, and the reanalysis anomaly. Please note that the ensemble member points are coloured by number to highlight the switch from the hindcast configuration (25 members) to the forecasts (51 members).

../_images/ce6f4a61794d7711067fc75d1e06c237980a04284029cfdc207152bdbcd4402f.png

Plot after detrending#

Plot the detrended anomaly data for each ensemble member, the ensemble mean, and the reanalysis anomaly.

../_images/a51e833df29aae2d5529c95d005df26e42e706ec95c220c6791ab2cd5930561d.png

3. Map of correlation for detrended data#

The correlation of the detrended data is plotted compared to reanalysis data for each grid cell on a map.

../_images/d204da3812802cffe0c93f6f4100732ca63f70eee428c7a0f6de3f33d05fccfd.png

../_images/f67345846a4cf4ca308904bf0f95ac9642abc7bacbcab59890a9f41cc6002790.png

../_images/9abf98b9c112adf7664606196448618710ae3b1cc51d48bcf41aefb742da9986.png

The hatched area is where the correlation exceeds the critical value.

It can be seen in the above plots that there is a high correlation over the North American Great Lakes. This is likely due to the fact that they are large enough that the heat retention impacts correlation skill at one month lead time.

Selecting regions#

Based on the correlation maps, regions are selected to investigate further; Addis Ababa, Bonn, NINO3.4.

../_images/6f820ce14bbdabec4014ebd1f5c1eee11e1ec773687a39c436661e6927d6e6b0.png

4. Plot and describe results#

In this section the regional means are computed and plots generated to illustrate scale variations for the selected regions.

../_images/2a49ac9b4ef48654e8f55fef1849ffbe767aeec7e4bf7255d2bce713c0b365fb.png

../_images/f87b4f0be2f7e7b7376e5aea24eb60da9dd5c0564e1cc3855c08406e4a13d1c0.png

../_images/94feec1571b3794767b86b3ded7f8c53bce5806f49fbb8432df4d89dae6d5369.png

../_images/b1003247cc6aa96c41e1528357bad14be490d9287ef3b519d6b31cf27ecff286.png

../_images/e26bef2fe6e9cc9dede7895c6ef8f92352a77c3be9bbac7b78cceac7abb9afd5.png

../_images/af6b80b5a398575b7bd83b7d7b4c230a883218729dc2474e2b2446c599cbeace.png

../_images/d45005f0bbfde0bb333eefad9eb779b576f11c05fdafe0918dcfd0e9227939b8.png

../_images/5242e3fab81557aa41d7076130ea49e1d5e985ba66090b066ca4721a1f5333d3.png

../_images/40fb2f71286949dbc8454684b68cf2e54f47985f127811f9195e423a87c41bd6.png

Discussion#

Looking at the comparison of different regions at different spatial scales a few conclusions can be drawn.

For regions with a visible trend, that trend inflates the correlation for t2m compared against analysis. This can be seen for Europe, where comparing the detrended and trended plots, the correlation is greater for the data with trend in Europe. Hence, for Europe, most of the visible correlation can be attributed to the underlying trend. This can be compared with the region of greater horn of Africa. Where comparing the trended and detrended case indicates that less of the visible correlation can be attributed to the underlying trend.

For the selected regions, forecast quality depending on scale varies. Looking at the Europe, Germany, Bonn case, there is a drop in forecast quality as the scale decreases. This cannot be observed in the NINO3.4 where the quality remains more or less constant, despite the decrease in region. The Greater Horn of Africa, where the drop off in quality with scale can be seen somewhat in the case of Addis Ababa, but not on the same scale as can be seen in the Europe case.

Considering this, and the anomalies for the different regions at different scales, it can be seen that the region of NINO3.4 has a better correlation with the reanalysis than the global regional mean, while Bonn has a low correlation where most of the quality can be attributed to the underlying trend. Addis Ababa has some correlation, which is retained even after detrending. The high correlations for the NINO3.4 region are partially dependent on the start date and forecast length, which can be seen in the SST indices plots on the C3S verification page), though it should be noted that these plots use sea surface temperature and not t2m. Forecast quality, and seasonal forecast biases, generally vary with start date and lead time, as seen in some of the other assessments.

ℹ️ If you want to know more#

Key resources#

Seasonal forecast monthly statistics on single levels: 10.24381/cds.68dd14c3
ERA5 monthly averaged data on single levels from 1940 to present: 10.24381/cds.f17050d7

Code libraries used:#

C3S EQC custom functions, c3s_eqc_automatic_quality_control, prepared by B-Open
xarray
numpy
matplotlib
cartopy
linear_model from sklearn

References#

[1] Greuell, W., Franssen, W. H. P., and Hutjes, R. W. A.: Seasonal streamflow forecasts for Europe – Part 2: Sources of skill, Hydrol. Earth Syst. Sci., 23, 371–391, https://doi.org/10.5194/hess-23-371-2019, 2019.

[2] Prodhomme, C., Materia, S., Ardilouze, C. et al. Seasonal prediction of European summer heatwaves. Clim Dyn 58, 2149–2166 (2022). https://doi.org/10.1007/s00382-021-05828-3

[3] Gubler, S., and Coauthors, 2020: Assessment of ECMWF SEAS5 Seasonal Forecast Performance over South America. Wea. Forecasting, 35, 561–584, https://doi.org/10.1175/WAF-D-19-0106.1

[4] Calì Quaglia, F., Terzago, S. & von Hardenberg, J. Temperature and precipitation seasonal forecasts over the Mediterranean region: added value compared to simple forecasting methods. Clim Dyn 58, 2167–2191 (2022). https://doi.org/10.1007/s00382-021-05895-6