Reproducing the single-system graphical products for additional variables

Reproducing the single-system graphical products for additional variables#

Introduction#

This Jupyter Notebook shows how the products shown in the C3S seasonal graphical products can be calculated from data in the Climate Data Store (CDS), and plotted. The C3S seasonal graphical and data products are described in this documentation page.

This example code can be used as the basis for creating graphical products which are not part of the C3S suite. In this example we look at monthly mean daily minimum temperature forecasts (for one system, ECMWF System 51), which is a variable available in the CDS dataset but not the graphical products. A ‘tercile summary’ is computed from tercile probabilities. An ensemble mean anomaly is also calculated and plotted with significance testing applied. This will be used in a further example to create multi-system combinations.

Configuration
CDS API requests
Load forecast and hindcast data
Produce tercile summary
Plot tercile summary
Produce ensemble mean anomaly
Plot ensemble mean anomaly

Configuration#

Here we set which variable(s) will be downloaded, for which C3S seasonal system. We also set which forecast date we will create a forecast for, and which hindcast period to use (which will be used to calculate the terciles and anomalies). Note that the URL and KEY need to be filled in with the details from your CDS account, and the cdsapi package needs to be installed.

Note: To create multi-system combinations in the related Notebook this code needs to be run for each of the system to be included. Simply edit the `prov = 'ecmf.s51' ` line to retrieve and process other systems, or add a loop which runs over multiple systems.

Import required modules and configure CDS API key and client.

# import required modules
import cdsapi
import os
import xarray as xr
import numpy as np
import pandas as pd

# Date and time related libraries
from dateutil.relativedelta import relativedelta
from calendar import monthrange
import datetime

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# config to avoid issues saving to netcdf if needed
os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'

# Need to leave this blank and the user needs to enter their url and key here
URL = 'https://cds.climate.copernicus.eu/api'
KEY = '' # INSERT CDS KEY HERE
c = cdsapi.Client(url=URL, key=KEY)

Define a library of provider details and variables of interest (only tmin is used here, but this will be compared to tmean and tmax in another Notebook).

# library of C3S systems and useful parameters
# max_hc and max_fc represent the number of members to be considered (the number index starts from 0, the most recent)
providers = {
    'ecmf.s51': {'cds_name': 'ecmwf', 'plot_name': 'ECMWF', 'plot_system': 'SEAS5', 'cds_system': '51',
                 'lagged': False},
    'lfpw.s8': {'cds_name': 'meteo_france', 'plot_name': 'Météo-France', 'plot_system': 'System 8', 'cds_system': '8',
                'lagged': False},
    'egrr.s602': {'cds_name': 'ukmo', 'plot_name': 'Met Office', 'plot_system': 'GloSea6', 'cds_system': '602',
                  'lagged': True, 'max_hc': 50, 'max_fc': 'none'},
    'edzw.s21': {'cds_name': 'dwd', 'plot_name': 'DWD', 'plot_system': 'GCFS2.1', 'cds_system': '21',
                 'lagged': False},
    'cmcc.s35': {'cds_name': 'cmcc', 'plot_name': 'CMCC', 'plot_system': 'SPS3.5', 'cds_system': '35',
                 'lagged': False},
    'kwbc.s2': {'cds_name': 'ncep', 'plot_name': 'NCEP', 'plot_system': 'CFSv2', 'cds_system': '2',
                'lagged': True, 'max_hc': 20, 'max_fc': 52},
    'rjtd.s3': {'cds_name': 'jma', 'plot_name': 'JMA', 'plot_system': 'CPS3', 'cds_system': '3',
                'lagged': True, 'max_hc': 'none', 'max_fc': 55},
    'cwao.s2': {'cds_name': 'eccc', 'plot_name': 'ECCC', 'plot_system': 'CanCM4i', 'cds_system': '2',
                'lagged': False},
    'cwao.s3': {'cds_name': 'eccc', 'plot_name': 'ECCC', 'plot_system': 'GEM5-NEMO', 'cds_system': '3',
                'lagged': False},
}

# some options for variables
vars = {
    'tmean': {'plot_name': 'daily mean T2m', 'cds_name': '2m_temperature'}, 
    'tmax' : {'plot_name': 'daily max T2m', 'cds_name': 'maximum_2m_temperature_in_the_last_24_hours'},
    'tmin' : {'plot_name': 'daily min T2m', 'cds_name': 'minimum_2m_temperature_in_the_last_24_hours'}
}

Set request details to be used by the API requests below. Also define a directory and structure where the data will be saved.

# select the provider from the dictionary above, 
# and the associated fields needed for the CDS API request and loading the data
prov = 'ecmf.s51'  
centre = providers[prov]['cds_name']
version = providers[prov]['cds_system']
lagged = providers[prov]['lagged']

# define some other parameters for the data request
fc_yr = '2024'
st_mon = '02'

var = 'tmin'  # select a variable from the dictionary above
var_str = vars[var]['plot_name']   # for plotting
cds_var_name = vars[var]['cds_name']  # for the CDS request

lt_mons = [1, 2, 3, 4, 5, 6]  # cover all lead months when plotting month by month
hc_years = '1993_2016'  # string to print the hindcast period, and label the file

# data path in cwd 
data_path = os.sep.join(['data', centre, version])
try:
   os.makedirs(data_path)
except FileExistsError:
   # directory already exists
   pass

CDS API requests#

Here we request the desired hindcast and forecast data in GRIB format using the CDS API, and save it within a folder ‘data’ in the current working directory, organised by originating centre and forecast system. For this example, the CDS API keywords used are:

Format: Grib
Variable: minimum_2m_temperature_in_the_last_24_hours set via ‘cds_var_name’
Originating centre: ECMWF set via ‘centre’
System: 51 this refers to SEAS5 system 51, set via ‘version’
Product type: Monthly mean all ensemble members will be retrieved
Year: 1993 to 2016 for the hindcast 2024 for the forecast, set via ‘hc_years’ and ‘fc_year’
Month: 02 February, set via ‘st_mon’
Leadtime month: 1 to 6 all lead months available, February to July in this case

#HINDCAST REQUEST - 1993-2016
c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'format': 'grib',
        'variable': cds_var_name,
        'originating_centre': centre,
        'system': version,
        'product_type': 'monthly_mean',
        'year': [
            '1993', '1994', '1995',
            '1996', '1997', '1998',
            '1999', '2000', '2001',
            '2002', '2003', '2004',
            '2005', '2006', '2007',
            '2008', '2009', '2010',
            '2011', '2012', '2013',
            '2014', '2015', '2016',
        ],
        'month': [st_mon],
        'leadtime_month': lt_mons
    },
    data_path + '/{}_mm_{}_{}_{}_{}.grib'.format(var, centre, version, hc_years, st_mon))

#FORECAST REQUEST
c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'format': 'grib',
        'variable': [cds_var_name],
        'originating_centre': centre,
        'system': version,
        'product_type': 'monthly_mean',
        'year': [fc_yr],
        'month': [st_mon],
        'leadtime_month': lt_mons
    },
    data_path + '/{}_mm_{}_{}_{}_{}.grib'.format(var, centre, version, fc_yr, st_mon))

Produce tercile summary#

Calculate tercile summary from hindcasts#

The first step of calculating probabilities for the tercile summary is to calculate the terciles from the hindcast period.

Plot tercile summary#

Plot setup#

Here we set up a directory for saving the plots, and set up some plot parameters.

# levels to use when shading the plot, and corresponding colours
contour_levels = [-100., -70., -60., -50., -40., 40., 50., 60., 70., 100.]
contour_colours = ["navy", "blue", "deepskyblue", "cyan", "white", "yellow", "orange", "orangered", "tab:red"]

Make a plot for each lead month#

Note: the C3S graphical products are contour plots rather than the grid mesh plots below

for ltm in lt_mons:
    lt_str = str(ltm).zfill(2)
    plot_data = P_summary.sel(forecastMonth=ltm)  # extract the specific forecast month

    # clip the data to sit within levels range
    plot_data = plot_data.clip(min=-99, max=99)
    
    valid_time = pd.to_datetime(plot_data.valid_time.values)
    vm_str = valid_time.strftime('%b')  # valid month string to label the plot
    title_txt1 = '{} system={}'.format(centre, version) + ', Probability most likely category ({})'.format(var_str)
    title_txt2 = 'start date = {}/{}, valid month: {} (leadtime_month = {})'.format(fc_yr, st_mon, vm_str, lt_str)

    fig = plt.figure(figsize=(16, 8))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=0.5)
    ax.add_feature(cfeature.COASTLINE, edgecolor='black', linewidth=2.)
    plot_data.plot(levels=contour_levels, colors=contour_colours,
                                               cbar_kwargs={'fraction': 0.033, 'extend': 'neither'})
    plt.title(title_txt1 + '\n' + title_txt2)
    plt.tight_layout()

../_images/fb63e3f5adc2fa746047712f15a8f7e253e1457005da418826bd8b3b770a13a0.png

../_images/928c48beb459b67b609410eba727ce7fcb6eb02b391a33ba13e35e51881dea60.png

../_images/91eddc51f960885f1e19379c1192f867e4c6736fecb0e73bee17e493f90639ad.png

../_images/605747e970ad31a944090326edc8a2e69a9b4aa501ddf5638658ea2c6c16786e.png

../_images/818a4ef030d5a68a91be25409e38a08914b3a3c0df0543a05a5e16956ac739ea.png

../_images/735f8c4f469305d389821dd3e682a22f32d9e28791d6e6e36bafee41be359c11.png

Plot ensemble mean anomaly#

Plot setup#

Note, the level values will need to be updated for variables other than Tmin/Tmax/Tmean.

contour_levels = [-2., -1.5, -1., -0.5, -0.2, 0.2, 0.5, 1.0, 1.5, 2.0]
contour_colours = ["navy", "blue", "deepskyblue", "cyan", "white", "yellow", "orange", "orangered", "tab:red"]
# clip the data to -2 and 2
fcst_anom_mean = fcst_anom_mean.clip(min=-1.99, max=1.99)

Make a basic plot for one lead month#

We plot the ensemble mean anomaly for the first forecast month. Here no significance testing is applied, this will be added in the next step, so this plot is just to demonstrate the difference this makes.

ltm = 1 
lt_str = str(ltm).zfill(2)
plot_data = fcst_anom_mean.sel(forecastMonth=ltm)  # extract the specific forecast month

valid_time = pd.to_datetime(plot_data.valid_time.values)
vm_str = valid_time.strftime('%b')  # valid month string to label the plot
title_txt1 = '{} system={}'.format(centre, version) + ', Ensemble mean anomaly ({})'.format(var_str)
title_txt2 = 'Start date = {}/{}, valid month: {} (leadtime_month {})'.format(fc_yr, st_mon, vm_str, lt_str)

fig = plt.figure(figsize=(16, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=0.5)
ax.add_feature(cfeature.COASTLINE, edgecolor='black', linewidth=2.)
plot_data.plot(levels=contour_levels, colors=contour_colours,
                                           cbar_kwargs={'fraction': 0.033, 'extend': 'neither'})
plt.title(title_txt1 + '\n' + title_txt2)
plt.tight_layout()

../_images/aeeb40a8ac45fb7be42d448196b6508301e8c6ff644f1db8cc568130497284f3.png

Add significance contours and masking#

The single-system ensemble mean plots in the C3S seasonal graphical products also include significance testing, following the approach taken for SEAS5 products (see https://www.ecmwf.int/sites/default/files/medialibrary/2017-10/System5_guide.pdf). Anomaly values below the 10% significance level are masked out, and significance contours are drawn for 1% and 10%. Note, due to the significance testing this cell takes longer to run than the previous one. Note that the colorbar requires customisation to represent the masking in a similar manner to the graphical products.

The first plot may be compared to the plot above, to see the difference.

# Function for non-paired rank-sum test
from scipy.stats import mannwhitneyu

# Significance thresholds
pval_thresh_low = 0.1
pval_thresh_high = 0.01

for ltm in lt_mons:
    
    data1 = hcst.sel(forecastMonth=ltm)
    data2 = fcst.sel(forecastMonth=ltm)
 
    # need to flatten number and start date to simply 'samples'
    # proceed in numpy for simplicity
    data1 = data1.data  # includes member and start date
    data2 = data2.data  # includes member

    # flatten sample dimensions
    data1 = data1.reshape(-1, *data1.shape[-2:])  # lat lon are the last dimensions
    print('Flattened forecasts: ', data1.shape)

    # non-paired approach, need to use Mann–Whitney U test
    pvals2 = mannwhitneyu(data1, data2)
    pvals2.pvalue
    
    masked_anom_mean = fcst_anom_mean.sel(forecastMonth=ltm).where(pvals2.pvalue <= 0.1)

    # specific values for Tmax, adjusted to add a dummy white section to label as 'no signal'
    contour_levels = [-2., -1.5, -1., -0.5, -1e-15, 1e-15, 0.5, 1.0, 1.5, 2.0]  # dummy values to plot central white portion in colorbar
    cbar_colours = ["navy", "blue", "deepskyblue", "cyan", "white", "yellow", "orange", "orangered", "tab:red"]
    cbar_labels = ["<-1.5", "-1.5..-1.0", "-1.0..-0.5", "-0.5..0", "no signal", "0..0.5", "0.5..1.0", "1.0..1.5", ">1.5"]
    
    lt_str = str(ltm).zfill(2)
    valid_time = pd.to_datetime(plot_data.valid_time.values)
    vm_str = valid_time.strftime('%b')  # valid month string to label the plot
    title_txt1 = '{} system={}'.format(centre, version) + ', Ensemble mean anomaly ({})'.format(var_str)
    title_txt2 = 'Start date = {}/{}, valid month: {} (leadtime_month {})'.format(fc_yr, st_mon, vm_str, lt_str)
    print(title_txt1)
    print(title_txt2)

    fig = plt.figure(figsize=(16, 8))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.add_feature(cfeature.BORDERS, edgecolor='black', linewidth=0.5)
    ax.add_feature(cfeature.COASTLINE, edgecolor='black', linewidth=2.)
    im = masked_anom_mean.plot(levels=contour_levels, colors=cbar_colours, add_colorbar=False)
    cbar = plt.colorbar(im, fraction=0.023, extend='neither', ticks=[-1.75, -1.25, -0.75, -0.25, 0, 0.25, 0.75, 1.25, 1.75])
    cbar.ax.set_yticklabels(cbar_labels)
    plt.contour(masked_anom_mean.lon.values, masked_anom_mean.lat.values, pvals2.pvalue, levels=[0.01], colors='green')
    plt.title(title_txt1 + '\n' + title_txt2)
    plt.tight_layout()

Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 01)
Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 02)
Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 03)
Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 04)
Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 05)
Flattened forecasts:  (600, 180, 360)
ecmwf system=51, Ensemble mean anomaly (daily min T2m)
Start date = 2024/02, valid month: Feb (leadtime_month 06)

../_images/a8a52174e3f22ca6e73e3f1772508728b8da40b0340adae923cc08c0d9c506df.png

../_images/48b7e680c92cd1739cf81c7c66d638ba11ce94df2ad09cbe498758a9161dc4a4.png

../_images/75239691619495fafbfba44800da8be1f0b488f4b8ed02c0c756e5f046317601.png

../_images/7cfbf8672642dbf289ba0fe7de70ca25851ee8ac93c76802f0fb7128c77ea5f6.png

../_images/f8806b510d7045a7965047bea8cc121478b61c469464deb50b6d559503ffb0a0.png

../_images/07944e96ef56c627bdc63c4364cca1b0dcdca87bec925d1bb7dc98f4c6357305.png

These plots are similar to the monthly C3S graphical products. Note that here we have plotted the first month of the forecast (February, or leadtime_month = 1), while in the graphical products we show the monthly forecasts starting with the month following the release month (which would be March in this case). The three-month aggregations shown in the graphical products could be created in a similar manner, by combining the corresponding monthly means, and then proceeding as above.

Reproducing the single-system graphical products for additional variables

Contents

Reproducing the single-system graphical products for additional variables#

Introduction#

Configuration#

CDS API requests#

Load forecast and hindcast data#

Select qualifying ensemble members#

Produce tercile summary#

Calculate tercile summary from hindcasts#

Compute forecast probabilities#

Plot tercile summary#

Plot setup#

Make a plot for each lead month#

Produce ensemble mean anomaly#

Plot ensemble mean anomaly#

Plot setup#

Make a basic plot for one lead month#

Add significance contours and masking#