Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets

logo

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch

7.3. Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets#

Production date: 2025-10-14.

Dataset version: 2.0.

Produced by: Olivier Burggraaff, Nicole Reynolds (National Physical Laboratory).

🌍 Use case: Retrieving climate indicators from the Copernicus Interactive Climate Atlas#

❓ Quality assessment question#

Are the climate indicators in the dataset underpinning the Copernicus Interactive Climate Atlas consistent with their origin datasets?
Can the dataset underpinning the Copernicus Interactive Climate Atlas be reproduced from its origin datasets?

The Copernicus Interactive Climate Atlas, or C3S Atlas for short, is a C3S web application providing an easy-to-access tool for exploring climate projections, reanalyses, and observational data [Gutiérrez+24]. Version 2.0 of the application allows the user to interact with 12 datasets:

Type	Dataset
Climate Projection	CMIP6
Climate Projection	CMIP5
Climate Projection	CORDEX-CORE
Climate Projection	CORDEX-EUR-11
Reanalysis	ERA5
Reanalysis	ERA5-Land
Reanalysis	ORAS5
Reanalysis	CERRA
Observations	E-OBS
Observations	BERKEARTH
Observations	CPC
Observations	SST-CCI

These datasets are provided through an intermediary dataset, the Gridded dataset underpinning the Copernicus Interactive Climate Atlas or C3S Atlas dataset for short [C3S Atlas dataset]. Compared to their origins, the versions of the climate datasets within the C3S Atlas dataset have been processed following the workflow in Figure 7.3.1.

../_images/c3s_atlas_dataset_workflow.png — Fig. 7.3.1 Schematic representation of the workflow for the production of the C3S Atlas dataset from its origin datasets, from the User-tools for the C3S Atlas.#

Because a wide range of users interact with climate data through the C3S Atlas application, it is crucial that the underpinning dataset represent its origins correctly. In other words, the C3S Atlas dataset must be consistent with and reproducible from its origins. Here, we assess this consistency and reproducibility by comparing climate indicators retrieved from the C3S Atlas dataset with their equivalents calculated from the origin dataset, mirroring the workflow from Figure 7.3.1. While a full analysis and reproduction of every record within the C3S Atlas dataset is outside the scope of quality assessment (and would require high-performance computing infrastructure), a case study with a narrower scope probes these quality attributes of the dataset and can be a jumping-off point for further analysis by the reader.

This notebook is part of a series:

Notebook	Contents
Consistency between the C3S Atlas dataset and its origins: Case study	Comparison between C3S Atlas dataset and one origin dataset (CMIP6) for one indicator (`tx35`), including detailed setup.
Consistency between the C3S Atlas dataset and its origins: Multiple indicators	Comparison between C3S Atlas dataset and one origin dataset (CMIP6) for multiple indicators.
Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets	Comparison between C3S Atlas dataset and multiple origin datasets for one indicator.

📢 Quality assessment statement#

These are the key outcomes of this assessment

Climate indicators (here 3 monthly indicators) provided by the C3S Atlas dataset are highly consistent with values calculated from its origin datasets (here 10 climate projection, observation, and reanalysis datasets).
There are some differences in coverage between the C3S Atlas dataset and its origins due to differences in the version used, e.g. E-OBS temperature indicators at the edges of Europe, but these do not affect the vast majority use cases for the C3S Atlas.
Differences between the C3S Atlas dataset and a manual reproduction are rare and generally negligible (median absolute difference of 0 for tx35 and r01, ≤0.0003 °C for SST). Where differences occur, they can be explained by dataset versions or details of the workflow implementation.
The C3S Atlas is traceable and reproducible, and can be confidently used to view, analyse, and download climate data.
For specific use cases, like scientific papers, it is recommended to manually process the origin dataset instead of using the C3S Atlas. This is not necessary for other use cases, such as climate risk assessments and climate reports, in which case the C3S Atlas can be used as is.

📋 Methodology#

This quality assessment tests the consistency between climate indicators retrieved from the Gridded dataset underpinning the Copernicus Interactive Climate Atlas [C3S Atlas dataset] and their equivalents calculated from the origin datasets, as well as the reproducibility of said dataset.

This notebook probes the consistency between the C3S Atlas dataset and multiple origin datasets at the same time. Due to differences in scope (e.g. atmosphere / land / sea), not every indicator is available in every origin dataset or its C3S Atlas derivative. Furthermore, some origin datasets are historical while others are future projections. For this reason, we will examine the following indicators in the following origin datasets:

Monthly count of days with maximum near-surface (2-metre) air temperature above 35 °C (tx35)

Type	Dataset
Climate Projection	CMIP6
Climate Projection	CMIP5
Climate Projection	CORDEX-EUR-11
Reanalysis	ERA5
Reanalysis	ERA5-Land
Observations	E-OBS
Observations	BERKEARTH

Note that CORDEX-CORE has been left out of this assessment because its mosaicking workflow is out of scope. CERRA has been left out because the C3S User-tools package is currently not fully compatible with this dataset.

Monthly mean temperature of sea water near the surface (sst)

Type	Dataset
Reanalysis	ORAS5
Observations	SST-CCI

Monthly count of days with daily accumulated precipitation of liquid water equivalent from all phases above 1 mm (r01)

Type	Dataset
Observations	CPC

The analysis and results are organised in the following steps, which are detailed in the sections below:

1. Code setup

Install User-tools for the C3S Atlas.
Import all required libraries.
Define indicators.
Define helper functions.

2. Calculate and retrieve indicators

Download data from the origin datasets.
Homogenise data.
Calculate indicators.
Regrid the origin data to the C3S Atlas grid.
Download corresponding data from the C3S Atlas dataset.

3. Results

Consistency: Compare the C3S Atlas and reproduced datasets on native grids.
Reproducibility: Compare the C3S Atlas and reproduced datasets on the C3S Atlas grid.

📈 Analysis and results#

1. Code setup#

Note

This notebook uses earthkit for downloading (earthkit-data) and visualising (earthkit-plots) data. Because earthkit is in active development, some functionality may change after this notebook is published. If any part of the code stops functioning, please raise an issue on our GitHub repository so it can be fixed.

Install the User-tools for the C3S Atlas#

This notebook uses the User-tools for the C3S Atlas, which can be installed from GitHub using pip. For convenience, the following cell can do this from within the notebook. Further details and alternative options for installing this library are available in its documentation.

Import required libraries#

In this section, we import all the relevant packages needed for running the notebook.

Define indicators#

This section defines functions and variables for calculating and using the climate indicators.

The following cell includes a helper function (in the form of a decorator) to correctly propagate NaNs, which is important for datasets using land/sea masks:

The following cell contains functions for calculating the indicators described in the introduction:

The following cell defines earthkit-plots styles for the indicators. These styles define the colour maps and colour bar ranges for each quantity. Earthkit-plots styles are explained further in the corresponding documentation.

Helper functions#

This section defines some functions and variables used in the following analysis, allowing code cells in later sections to be shorter and ensuring consistency.

Data downloading & (pre-)processing#

The following functions help with downloading data from datasets with limits, by generating multiple CDS requests with similar parameters:

The following functions aid in sub-selecting data, e.g. selecting one model from the ensemble included in the C3S Atlas dataset or selecting data for a specific time frame:

The following functions handle data chunking in dask for computational efficiency:

The following functions handle the homogenisation of origin data to a consistent format using the User-tools for the C3S Atlas:

The following functions handle regridding data based on ESMF as implemented in the User-tools for the C3S Atlas. This step is explained in more detail in the relevant section below.

Statistics#

The following functions calculate the difference (absolute / relative) between datasets, handling metadata etc.:

Show code cell source

Hide code cell source

# Constants
NONZERO_THRESHOLD = 1e-5
NONZERO_THRESHOLD_PCT = 0.1

# Difference between datasets
def difference_between_datasets(data1: xr.Dataset, data2: xr.Dataset, diff_variables: Iterable[str]) -> xr.Dataset:
    """ Calculate the difference between two datasets, preserving CRS and metadata. """
    # Align
    data1, data2 = xr.align(data1, data2, join="inner", copy=False)

    # Subtract
    data1, data2 = [d.drop_vars(["lat_bnds", "lon_bnds", "time_bnds", "height"], errors="ignore") for d in (data1, data2)]
    difference = xr.ufuncs.subtract(data1[diff_variables], data2[diff_variables])
    return difference

def relative_difference_between_datasets(data1: xr.Dataset, data2: xr.Dataset, reldiff_variables: Iterable[str]) -> xr.Dataset:
    """
    Calculate the relative [%] difference between two datasets, preserving CRS and updating metadata.
    Relative difference is calculated relative to the first dataset.
    Where data1 == 0 and data2 == 0, the relative difference is set to 0 too.
    """
    # Select and calculate
    data1, data2 = [dataset.drop_vars([var for var in dataset.data_vars if var not in [*reldiff_variables, "crs"]]) \
                        for dataset in (data1, data2)]

    relative_difference = (data1 - data2) / data1 * 100.

    # Replace 0/0 with 0
    data1_zero = (data1 <= NONZERO_THRESHOLD)  # Threshold slightly > 0 because of floating-point errors
    relative_difference = relative_difference.where(~data1_zero, 0.)

    # Add name
    relative_difference = relative_difference.assign_attrs({"name": "C3S Atlas – Reproduced [%]"})

    return relative_difference

def fraction_over_threshold(data: pd.DataFrame, threshold: float=NONZERO_THRESHOLD) -> pd.DataFrame:
    """ Calculate the % of non-NaN cells in a dataframe greater than a given threshold. """
    data_over = (data >= threshold)
    frac_over = data_over.sum() / data_over.count() * 100.
    return frac_over

fraction_over_threshold_relative = partial(fraction_over_threshold, threshold=NONZERO_THRESHOLD_PCT)

The following functions calculate and display metrics for the difference between two datasets, e.g. mean and median deviation:

Show code cell source

Hide code cell source

def _comparison_statistics_single(dataset1: xr.Dataset, dataset2: xr.Dataset, indicator: str, *,
                                  do_relative=False) -> pd.DataFrame:
    """
    Given two datasets, calculate a number of statistics for one variable and return the result in a table.
    This version is hardcoded for one indicator in one year.
    Run this multiple times in a loop to do multiple origins.
    """
    # Calculate differences
    differences = difference_between_datasets(dataset1, dataset2, diff_variables=[indicator,])
    differences_abs = xr.ufuncs.abs(differences)

    # Calculate relative difference if desired
    if do_relative:
        differences_rel = relative_difference_between_datasets(dataset1, dataset2, reldiff_variables=[indicator,])
        differences_rel_abs = xr.ufuncs.abs(differences_rel)

    # Calculate aggregate statistics
    # (n.b. in dask, this only queues them up, doesn't actually compute)
    stats = {r"Mean Δ": differences[indicator].mean(),
             r"Median Δ": nanmedian(ravel(differences[indicator]), axis=0),
             r"Median |Δ|": nanmedian(ravel(differences_abs[indicator]), axis=0),
             r"% where |Δ| ≥ ε": fraction_over_threshold(differences_abs[indicator]),
            }

    if do_relative:
        stats_rel = {r"Median |Δ| [%]": nanmedian(ravel(differences_rel_abs[indicator]), axis=0),
                     rf"% where |Δ| ≥ {NONZERO_THRESHOLD_PCT}%": fraction_over_threshold_relative(differences_rel_abs[indicator]),
                    }
        stats = stats | stats_rel

    # Calculate correlation coefficients
    corrs = {r"Pearson r": xr.corr(dataset1[indicator], dataset2[indicator])}
    stats = stats | corrs

    # Perform the queued-up calculations
    stats = {key: float(stat) for key, stat in stats.items()}

    # Combine statistics into one dataframe
    stats = pd.DataFrame.from_dict({indicator: stats}, orient="index")

    return stats

def comparison_statistics(dataset1: xr.Dataset, dataset2: xr.Dataset, indicator: str, *,
                          do_relative=False) -> pd.DataFrame:
    """
    Given two dicts of datasets, calculate a number of statistics for one variable and return the result in a table.
    This version is hardcoded for one indicator, one year, multiple datasets.
    """
    # Setup: origins
    common_keys = [k for k in dataset1 if k in dataset2]
    loop_origins = tqdm(common_keys, desc="Calculating for datasets", leave=False)

    # Calculate statistics per dataset, combine result, return
    stats = [_comparison_statistics_single(dataset1[key], dataset2[key], indicator, do_relative=do_relative).rename(index={indicator: key})
             for key in loop_origins]
    stats = pd.concat(stats, axis=0)

    return stats

def display_difference_stats(dataset1: dict, dataset2: dict, *args,
                             glue_label: Optional[str]=None, **kwargs) -> str:
    """ Given two dicts of datasets, calculate a number of statistics for each pair and display the result in a table. """
    comparison_stats = comparison_statistics(dataset1, dataset2, *args, **kwargs)
    formatted = comparison_stats.style \
                                .format(precision=5)  \
                                .set_caption("C3S Atlas – Reproduced")
    return formatted

Visualisation#

The following cells contain functions for plotting results, starting with some base helper functions (e.g. displaying in Jupyter Notebook or Jupyter Book style, adding textboxes with consistent formatting, etc.):

The following functions are also base helper functions, but specific to geospatial plots:

The following cell contains functions for geospatial comparisons between datasets on their native grids or on a common grid (the latter also showing the per-pixel difference):

Show code cell source

Hide code cell source

def geospatial_comparison_multiple_origins(data1: dict[str, xr.Dataset], data2: dict[str, xr.Dataset], indicator: str, date: str, *,
                                           label1: str="C3S Atlas dataset", label2: str="Reproduced from origin",
                                           domain: Optional[AnyDomain]=None,
                                           glue_label: Optional[str]=None) -> None:

    """
    Plot one indicator in multiple dicts of datasets.
    A specific date (e.g. year+month as "2010-06") has to be specified.
    """
    # Setup: origins
    common_keys = [k for k in data1 if k in data2]
    n_datasets = len(common_keys)
    loop_origins = tqdm(common_keys, desc="Plotting datasets", leave=False)

    # Create figure
    fig = ekp.Figure(rows=n_datasets, columns=2, size=(7.5, max(5, 3*n_datasets)))

    for key in loop_origins:
        data1_here, data2_here = [d[key].sel(time=date, method="nearest") for d in (data1, data2)]

        # Plot individual datasets
        subplots_data = _spatial_plot_append_subplots(fig, data1_here, data2_here, domain=domain,
                                                      z=indicator, style=styles[indicator])

        # Decorate: Text
        _add_textbox_to_subplots(f"{key} ({date})", *subplots_data)

    # Colour bar at the bottom
    for subplot in fig.subplots[-2:]:
        subplot.legend(label=indicator, location="bottom")

    # Titles on top
    titles = [label1, label2]
    for title, subplot in zip(titles, fig.subplots):
        subplot.ax.set_title(title)

    # Decorate figure
    fig.land()
    fig.coastlines()
    fig.gridlines(linestyle=plt.rcParams["grid.linestyle"])

    # Show result
    _glue_or_show(fig.fig, glue_label)


# Visualisation: Plot indicators geospatially
def geospatial_comparison_multiple_origin_with_difference(data1: dict, data2: dict, indicator: str, date: str, *,
                                                          label1: str="C3S Atlas dataset", label2: str="Reproduced from origin",
                                                          domain: Optional[AnyDomain]=None,
                                                          glue_label: Optional[str]=None) -> None:
    """
    Plot one indicator in multiple dicts of datasets.
    A specific date (e.g. year+month as "2010-06") has to be specified.
    """
    # Setup: origins
    common_keys = [k for k in data1 if k in data2]
    n_datasets = len(common_keys)
    loop_origins = tqdm(common_keys, desc="Plotting datasets", leave=False)

    # Create figure
    fig = ekp.Figure(rows=n_datasets, columns=3, size=(7.5, max(5, 2*n_datasets)))

    for key in loop_origins:
        data1_here, data2_here = [d[key].sel(time=date, method="nearest") for d in (data1, data2)]
        difference = difference_between_datasets(data1_here, data2_here, diff_variables=[indicator,])

        # Plot individual datasets
        subplots_data = _spatial_plot_append_subplots(fig, data1_here, data2_here, domain=domain,
                                                      z=indicator, style=styles[indicator])

        # Plot difference
        subplot_diff = fig.add_map(domain=domain)
        subplot_diff.grid_cells(difference, z=indicator, style=styles[f"{indicator}_diff"])

        # Decorate: Text
        _add_textbox_to_subplots(f"{key} ({date})", *subplots_data, subplot_diff)

    # Colour bar at the bottom
    for subplot in subplots_data:
        subplot.legend(label=indicator, location="bottom")
    subplot_diff.legend(label="Difference", location="bottom")


    # Titles on top
    titles = [label1, label2, "Difference"]
    for title, subplot in zip(titles, fig.subplots):
        subplot.ax.set_title(title)

    # Decorate figure
    fig.land()
    fig.coastlines()
    fig.gridlines(linestyle=plt.rcParams["grid.linestyle"])

    # Show result
    _glue_or_show(fig.fig, glue_label)

The following cell contains functions for histogram comparisons between datasets on their native grids or on a common grid (the latter also showing the per-pixel difference):

Show code cell source

Hide code cell source

# Visualisation: Plot data in histograms
def histogram_comparison_by_origin(data1: dict, data2: dict, indicator: str, *,
                                   label1: str="C3S Atlas dataset", label2: str="Reproduced from origin",
                                   year: Optional[str]="", log=False,
                                   glue_label: Optional[str]=None) -> None:
    """
    Plot a histogram for one indicator in multiple dicts of datasets.
    Flattens all data in the datasets, including spatial and temporal dimensions.
    """
    # Setup: origins
    common_keys = [k for k in data1 if k in data2]
    n_datasets = len(common_keys)
    loop_origins = tqdm(common_keys, desc="Plotting datasets", leave=False)

    # Create figure
    fig, axs = plt.subplots(nrows=n_datasets, ncols=2, sharex="row", sharey="row",
                            figsize=(5, 2*n_datasets), layout="constrained", squeeze=False)

    # Plot histograms of data
    # Loop over rows / origins
    for ax_row, key in zip(axs, loop_origins):
        data1_here, data2_here = data1[key], data2[key]

        for ax, data in zip(ax_row, (data1_here, data2_here)):
            # Flatten data
            d = data[indicator].values.ravel()

            # Create histogram
            ax.hist(d, bins=31, density=True, log=log, color="black")

            # Labels
            ax.grid(True, axis="both")
            ax.set_xlabel(indicator)

        ax_row[0].set_ylabel("Frequency")

        # Identify panel
        date_label = f" ({year})" if year else ""
        _add_textbox_to_subplots(f"{key}{date_label}", *ax_row, right=True)

    # Titles on top
    titles = [label1, label2]
    for title, ax in zip(titles, axs[0]):
        ax.set_title(title)

    # Show result
    _glue_or_show(fig, glue_label)


# Visualisation: Plot data + difference in histograms
def histogram_comparison_by_origin_with_difference(data1: dict, data2: dict, indicator: str, *,
                                                   label1: str="C3S Atlas dataset", label2: str="Reproduced from origin",
                                                   year: Optional[str]="", log=False,
                                                   glue_label: Optional[str]=None) -> None:
    """
    Plot a histogram for one indicator in multiple dicts of datasets, including the point-by-point difference.
    Flattens all data in the datasets, including spatial and temporal dimensions.
    """
    # Setup: origins
    common_keys = [k for k in data1 if k in data2]
    n_datasets = len(common_keys)
    loop_origins = tqdm(common_keys, desc="Plotting datasets", leave=False)

    # Create figure
    fig, axs = plt.subplots(nrows=n_datasets, ncols=3,
                            figsize=(8, 2*n_datasets), layout="constrained", squeeze=False)

    # Setup x/y share -- cannot be done in plt.subplots because of difference panel not sharing these
    _sharexy(axs[:, :-1])
    # _sharexy(axs[:, -1])  # Leave out for now -- ALlow differences between origin datasets

    for ax in axs[:-1, :-1].ravel():
        ax.tick_params(axis="x", labelbottom=False)
        ax.xaxis.label.set_visible(False)

    for ax in axs[:, 1:-1].ravel():
        ax.tick_params(axis="y", labelleft=False)

    for ax in axs[:, -1].ravel():
        ax.yaxis.set_label_position("right")
        ax.tick_params(axis="y", labelleft=False, labelright=True)

    # Plot histograms of data
    # Loop over rows / origins
    for ax_row, key in zip(axs, loop_origins):
        data1_here, data2_here = data1[key], data2[key]
        difference = difference_between_datasets(data1_here, data2_here, diff_variables=[indicator,])

        # Loop over columns / data
        for ax, data in zip(ax_row, (data1_here, data2_here)):
            # Flatten data
            d = data[indicator].values.ravel()

            # Create histogram
            ax.hist(d, bins=31, density=True, log=log, color="black")

            # Labels
            ax.grid(True, axis="both")
            ax.set_xlabel(indicator)

            # Plot difference
            difference_here = difference[indicator].values.ravel()
            ax_row[-1].hist(difference_here, bins=31, density=True, log=log, color="black")

        ax_row[0].set_ylabel("Frequency")

        # Identify panel
        date_label = f" ({year})" if year else ""
        _add_textbox_to_subplots(f"{key}{date_label}", *ax_row, right=True)

    for ax in axs[:, -1]:  # Symmetric xlims for difference
        _symmetric_xlim(ax)

    # Titles on top
    titles = [label1, label2, "Difference"]
    for title, ax in zip(titles, axs[0]):
        ax.set_title(title)

    # Show result
    _glue_or_show(fig, glue_label)

2. Calculate and retrieve indicators#

In the previous two notebooks in this assessment, the origin data were downloaded, pre-processed, used to calculate the relevant indicator(s), and regridded; after which the Atlas dataset was downloaded. This notebook follows the same structure but for each origin in turn, for clarity and to preserve memory when loading multiple datasets at the same time. As such, the individual steps are described in less detail, because this information is available in the previous notebooks.

If you are only interested in specific origin datasets, or want to limit your bandwidth or memory usage, you can choose to only run specific subsections below.

This assessment examines the dataset in one year (2080), for one ensemble member and for one climate scenario – which is not how climate projection data are normally used. Good practice in climate science is to look at multi-year statistics and trends, across multiple ensemble members. However, since the purpose of this assessment is to assess the consistency and reproducibility of the post-processing performed to produce the C3S Atlas dataset, it is valid to use a subset of the data here. The specific subset used can be easily tweaked by changing the EXPERIMENT, MODEL, and YEARS variables in the relevant sections.

This notebook uses earthkit-data to download files from the CDS. If you intend to run this notebook multiple times, it is highly recommended that you enable caching to prevent having to download the same files multiple times. If you prefer not to use earthkit, the following requests can also be used with the cdsapi module. In either case (earthkit-data or cdsapi), it is required to set up a CDS account and API key as explained on the CDS website.

Note

This notebook uses xESMF for regridding data. xESMF is most easily installed using mamba/conda as explained in its documentation. Users who cannot or do not wish to use mamba/conda can manually compile and install ESMF on their machines. In future, this notebook will use earthkit-regrid instead, once it reaches suitable maturity.

Note that the C3S Atlas workflow calculates indicators first, then regrids. For operations that involve averaging, like smoothing and regridding, the order of operations can affect the result, especially in areas with steep gradients [Burggraaff+20]. Examples of such areas for a temperature index are coastlines and mountain ranges. In the case of C3S Atlas, this order of operations was a conscious choice to preserve the “raw” signals, e.g. preventing extreme temperatures from being smoothed out. However, it can affect the indicator values and therefore must be considered when using the C3S Atlas application or dataset.

General setup#

Throughout this section, we combine the downloaded datasets into dictionaries for easy access. They cannot be combined into a single xarray object because of differing grids. Each dataset is added in its own subsection, meaning any datasets not downloaded will automatically be skipped in the analysis.

CMIP6#

Show code cell output

Hide code cell output

Unknown file type, no reader available. path=/home/ob2/cds_data/cds-67885792c67c919c6adef8f9452ca11327458aa2fc79e661bd6835af4bbb1791.d/provenance.png magic=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\n\xd3\x00\x00\x03"\x08\x02\x00\x00\x00\x99\xec9+\x00\x00\x00\x06bKGD\x00\xff\x00\xff\x00\xff\xa0\xbd\xa7\x93\x00\x00 \x00IDATx\x9c\xec\xddw' content_type=None
2025-10-13 16:44:59,142 — Homogenization-fixers — INFO — Dataset has already the correct names for its coordinates
2025-10-13 16:44:59,155 — Homogenization-fixers — INFO — Fixing calendar for <xarray.Dataset> Size: 81MB
Dimensions:    (time: 365, bnds: 2, lat: 192, lon: 288)
Coordinates:
  * time       (time) object 3kB 2080-01-01 12:00:00 ... 2080-12-31 12:00:00
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
    height     float64 8B ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object 6kB dask.array<chunksize=(1, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 5kB dask.array<chunksize=(288, 2), meta=np.ndarray>
    tasmax     (time, lat, lon) float32 81MB dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
Attributes: (12/48)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            ScenarioMIP
    branch_method:          standard
    branch_time_in_child:   60225.0
    branch_time_in_parent:  60225.0
    comment:                none
    ...                     ...
    title:                  CMCC-ESM2 output prepared for CMIP6
    variable_id:            tasmax
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by CMCC is licensed und...
    cmor_version:           3.6.0
    tracking_id:            hdl:21.14100/ba2e335b-8bac-45ec-abbe-f1f16299d2d4
2025-10-13 16:44:59,332 — UNITS_TRANSFORM — INFO — The dataset tasmax units are not in the correct magnitude. A conversion from K to Celsius will be performed.
2025-10-13 16:44:59,405 — Homogenization-fixers — INFO — The dataset is in daily or monthly resolution, we don't need to resample it from hourly frequency

CMIP5#

Show code cell output

Hide code cell output

2025-10-13 16:45:09,478 — Homogenization-fixers — INFO — Dataset has already the correct names for its coordinates
2025-10-13 16:45:09,481 — Homogenization-fixers — INFO — Fixing calendar for <xarray.Dataset> Size: 41MB
Dimensions:    (time: 366, bnds: 2, lat: 145, lon: 192)
Coordinates:
  * time       (time) datetime64[ns] 3kB 2080-01-01T12:00:00 ... 2080-12-31T1...
  * lat        (lat) float64 1kB -90.0 -88.75 -87.5 -86.25 ... 87.5 88.75 90.0
  * lon        (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
    height     float64 8B ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 6kB dask.array<chunksize=(366, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 2kB dask.array<chunksize=(145, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
    tasmax     (time, lat, lon) float32 41MB dask.array<chunksize=(366, 145, 192), meta=np.ndarray>
Attributes: (12/28)
    institution:            CSIRO (Commonwealth Scientific and Industrial Res...
    institute_id:           CSIRO-BOM
    experiment_id:          rcp85
    source:                 ACCESS1-0 2011. Atmosphere: AGCM v1.0 (N96 grid-p...
    model_id:               ACCESS1-0
    forcing:                GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4...
    ...                     ...
    table_id:               Table day (01 February 2012) b6353e9919862612c81d...
    title:                  ACCESS1-0 model output prepared for CMIP5 RCP8.5
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.8.0
2025-10-13 16:45:09,492 — UNITS_TRANSFORM — INFO — The dataset tasmax units are not in the correct magnitude. A conversion from K to Celsius will be performed.
2025-10-13 16:45:09,543 — Homogenization-fixers — INFO — The dataset is in daily or monthly resolution, we don't need to resample it from hourly frequency

CORDEX-EUR-11#

Show code cell output

Hide code cell output

2025-10-13 16:45:13,404 — Homogenization-fixers — INFO — Fixing coordinates names: {'rlon': 'x', 'rlat': 'y'}
2025-10-13 16:45:13,407 — Homogenization-fixers — INFO — Fixing calendar for <xarray.Dataset> Size: 263MB
Dimensions:                     (time: 366, bnds: 2, y: 412, x: 424, vertices: 4)
Coordinates:
  * time                        (time) datetime64[ns] 3kB 2080-01-01T12:00:00...
  * y                           (y) float64 3kB -23.38 -23.27 ... 21.72 21.84
  * x                           (x) float64 3kB -28.38 -28.27 ... 18.04 18.16
    lat                         (y, x) float32 699kB dask.array<chunksize=(412, 424), meta=np.ndarray>
    lon                         (y, x) float32 699kB dask.array<chunksize=(412, 424), meta=np.ndarray>
    height                      float64 8B ...
Dimensions without coordinates: bnds, vertices
Data variables:
    time_bnds                   (time, bnds) datetime64[ns] 6kB dask.array<chunksize=(1, 2), meta=np.ndarray>
    rotated_latitude_longitude  int32 4B ...
    lat_vertices                (y, x, vertices) float32 3MB dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    lon_vertices                (y, x, vertices) float32 3MB dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    tasmax                      (time, y, x) float32 256MB dask.array<chunksize=(1, 412, 424), meta=np.ndarray>
Attributes: (12/35)
    institution:                    Helmholtz-Zentrum Geesthacht, Climate Ser...
    institute_id:                   GERICS
    experiment_id:                  rcp85
    source:                         GERICS-REMO2015
    model_id:                       GERICS-REMO2015
    forcing:                        N/A
    ...                             ...
    parent_experiment:              N/A
    modeling_realm:                 atmos
    realization:                    1
    cmor_version:                   2.9.1
    tracking_id:                    hdl:21.14103/c3a4e035-9dcd-429f-a86d-8d53...
    c3s_disclaimer:                 This data has been produced in the contex...
2025-10-13 16:45:13,425 — UNITS_TRANSFORM — INFO — The dataset tasmax units are not in the correct magnitude. A conversion from K to Celsius will be performed.
2025-10-13 16:45:13,436 — Homogenization-fixers — INFO — The dataset is in daily or monthly resolution, we don't need to resample it from hourly frequency

ERA5#

Note that ERA5 data for a year are >30 GB in size. These data may take up to several hours to download and require sufficient storage to download and cache.

ERA5-Land#

Note that ERA5-Land data for a year are >200 GB in size. These data may take up to several hours to download and require sufficient storage to download and cache.

E-OBS#

Note that E-OBS data for the full period are >30 GB in size. These data may take up to several hours to download and require sufficient storage to download and cache.

BERKEARTH#

ORAS5#

SST-CCI#

Note that SST-CCI data for a year are >150 GB in size. These data may take up to several hours to download and require sufficient storage to download and cache.

Show code cell output

Hide code cell output

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [00:00<00:00, 165.43it/s]
2025-11-03 16:20:40,236 — Homogenization-fixers — INFO — Dataset has already the correct names for its coordinates
2025-11-03 16:20:40,240 — Homogenization-fixers — INFO — Fixing calendar for <xarray.Dataset> Size: 151GB
Dimensions:                   (time: 365, lat: 3600, lon: 7200, bnds: 2)
Coordinates:
  * lat                       (lat) float32 14kB -89.97 -89.93 ... 89.93 89.97
  * lon                       (lon) float32 29kB -180.0 -179.9 ... 179.9 180.0
  * time                      (time) datetime64[ns] 3kB 2010-01-01T12:00:00 ....
Dimensions without coordinates: bnds
Data variables:
    sst                       (time, lat, lon) float32 38GB dask.array<chunksize=(1, 1200, 2400), meta=np.ndarray>
    mask                      (time, lat, lon) float32 38GB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray>
    sea_ice_fraction          (time, lat, lon) float32 38GB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray>
    analysed_sst_uncertainty  (time, lat, lon) float32 38GB dask.array<chunksize=(1, 1200, 2400), meta=np.ndarray>
    time_bnds                 (time, bnds) datetime64[ns] 6kB dask.array<chunksize=(1, 2), meta=np.ndarray>
    lat_bnds                  (time, lat, bnds) float32 11MB dask.array<chunksize=(1, 3600, 2), meta=np.ndarray>
    lon_bnds                  (time, lon, bnds) float32 21MB dask.array<chunksize=(1, 7200, 2), meta=np.ndarray>
Attributes: (12/58)
    Conventions:                     CF-1.5, Unidata Observation Dataset v1.0
    title:                           ESA SST CCI OSTIA L4 product
    summary:                         OSTIA L4 product from the ESA SST CCI pr...
    references:                      http://www.esa-sst-cci.org
    institution:                     ESACCI
    history:                         Created using OSTIA reanalysis system v3.0
    ...                              ...
    source:                          ATSR<1,2>-ESACCI-L3U-v2.0, AATSR-ESACCI-...
    platform:                        ERS-<1,2>, Envisat, NOAA-<07,09,11,12,14...
    creator_name:                    ESA SST CCI
    product_specification_version:   SST_CCI-PSD-UKMO-201-Issue-2
    id:                              OSTIA-ESACCI-L4-GLOB-v2.1
    product_version:                 2.1
2025-11-03 16:20:40,294 — UNITS_TRANSFORM — INFO — The dataset sst units are not in the correct magnitude. A conversion from Kelvin to Celsius will be performed.
2025-11-03 16:20:41,695 — Homogenization-fixers — INFO — The dataset is in daily or monthly resolution, we don't need to resample it from hourly frequency

CPC#

Cleanup#

Lastly, we manually clear out some memory-intensive objects that are no longer necessary.

We also re-chunk the datasets to be more computationally efficient:

3. Results#

This section contains the comparison between the indicator values retrieved from the C3S Atlas dataset vs those reproduced from the origin datasets.

The datasets are first compared on their native grids. This means a point-by-point comparison is not possible (because the points are not equivalent), but the distributions can be compared geospatially and overall. This qualitative comparison probes the consistency quality attribute: Are the climate indicators in the dataset underpinning the Copernicus Interactive Climate Atlas consistent with their origin datasets?

Second, the C3S Atlas dataset is compared to the indicators derived from the origin dataset and regridded to the C3S Atlas grid. This makes a quantitative point-by-point comparison possible. This second comparison probes how well the dataset underpinning the Copernicus Interactive Climate Atlas can be reproduced from its origin datasets, based on the workflow (Figure 7.3.1).

For the geospatial comparison, we display the values of the indicators for one month, across one region and globally. As an example, we display the results across Europe in June, which should provide significant spatial variation. This region can easily be modified in the following code cell using the domains provided by earthkit-plots. Some examples are provided in the cell (commented out using #).

Consistency: Comparison on native grids#

As in the previous notebooks, it is clear from the geospatial comparisons (Figures 7.3.2–7.3.5) that the C3S Atlas dataset closely resembles a manual reproduction from its origin datasets. The general distribution of indicator values is the same for all comparisons. The E-OBS tx35 (Figure 7.3.3) comparison shows clear differences in terms of coverage, which is explained by differences in versions used (26.0e in the production of the C3S Atlas dataset, 30.0e here). Additionally, the CMIP5 comparison (Figure 7.3.2) shows clear differences here due to the fact that the C3S Atlas version of CMIP5 is regridded to a coarser resolution to ensure consistency between the different model members of the ensemble.

This pattern is also visible in the overall distributions (Figures 7.3.6–7.3.9), which are again very similar for almost all comparisons.

The overall conclusion from this comparison is the same as in the previous notebooks. The C3S Atlas dataset and its origins are highly consistent, but small differences exist due to the difference in grid and differences in the origin dataset version used and workflow. Large differences were observed only in the availability or masking of data in specific areas, such as on the edges of the E-OBS dataset. Users of the C3S Atlas dataset – and thus users of the C3S Atlas application – should be aware that the indicator values retrieved for a specific location may differ slightly from a manual analysis of the origin dataset.

Projected tx35

../_images/73e2682c44403358ead14e9dd8e6f4360f7da506318e13efb1079b22cdaba741.png — Fig. 7.3.2 Comparison between C3S Atlas dataset and reproduction for projected `tx35` in one month, across Europe, on the native grid of each dataset.#

Historical tx35

../_images/39856847b76aa41c8b1a4891fb70c892d7dfa652ed0e3da120569ce78e263b07.png — Fig. 7.3.3 Comparison between C3S Atlas dataset and reproduction for historical `tx35` in one month, across Europe, on the native grid of each dataset.#

sst

../_images/1ac5341f4ad2543f104790368dc1e175e8d552b79b7f2a00f4289aa214ba8d61.png — Fig. 7.3.4 Comparison between C3S Atlas dataset and reproduction for historical `sst` in one month, across Europe, on the native grid of each dataset.#

r01

../_images/ac09ad983a92b1101cc47c84d9cbef32f55d1fb7b431f9a1e46de8f8038062e3.png — Fig. 7.3.5 Comparison between C3S Atlas dataset and reproduction for historical `r01` in one month, across Europe, on the native grid of each dataset.#

Projected tx35

../_images/4e36157275795d346231c255ae11018e41ee498c0681220fee709cdb6e6694d9.png — Fig. 7.3.6 Comparison between overall distributions of projected `tx35` values in the C3S Atlas dataset and its reproduction, across all spatial and temporal dimensions, on the native grid of each dataset.#

Historical tx35

../_images/400307558fb10441813a70206f84e524ccdc525f0503ab90d2c6dc5659cf5c52.png — Fig. 7.3.7 Comparison between overall distributions of historical `tx35` values in the C3S Atlas dataset and its reproduction, across all spatial and temporal dimensions, on the native grid of each dataset.#

sst

../_images/1db22d16f5bbdb17fc88a9c6c3cf2fae8f0aed875f5a59877247993f4e996f49.png — Fig. 7.3.8 Comparison between overall distributions of historical `sst` values in the C3S Atlas dataset and its reproduction, across all spatial and temporal dimensions, on the native grid of each dataset.#

r01

../_images/c41aff8f1bd2e6dca89892ad1789ecf708087141ebaa096daeee281f616bcaaa.png — Fig. 7.3.9 Comparison between overall distributions of historical `r01` values in the C3S Atlas dataset and its reproduction, across all spatial and temporal dimensions, on the native grid of each dataset.#

Reproducibility: Comparison on C3S Atlas grid#

After regridding to the C3S Atlas grid, the indicator values reproduced from the origin dataset can be compared point-by-point to the values retrieved from the C3S Atlas dataset. We first examine some metrics that describe the difference Δ between corresponding pixels:

C3S Atlas – Reproduced
	Mean Δ	% where \|Δ\| ≥ ε	Pearson r
CMIP6	-0.00095	0.09388	0.99998
CMIP5	0.00000	10.20628	1.00000
CORDEX-EUR-11	-0.00065	4.20466	1.00000

C3S Atlas – Reproduced
	Mean Δ	% where \|Δ\| ≥ ε	Pearson r
ERA5	0.00003	0.00311	1.00000
ERA5-Land	-0.00002	0.00597	1.00000
E-OBS	-0.11389	3.63519	0.91907
BERKEARTH	0.00000	0.00000	1.00000

C3S Atlas – Reproduced
	Mean Δ	Median Δ	Median \|Δ\|	% where \|Δ\| ≥ ε	Pearson r
ORAS5	0.00000	0.00000	0.00024	65.47403	1.00000
SST-CCI	0.00003	0.00009	0.00020	65.15363	1.00000

C3S Atlas – Reproduced
	Mean Δ	Median Δ	Median \|Δ\|	% where \|Δ\| ≥ ε	Pearson r
CPC	0.00000	0.00000	0.00000	0.00000	1.00000

As in the previous notebooks, it is clear that the C3S Atlas dataset and its manual reproduction are very similar. The median difference, median absolute difference, and median absolute percentage difference are all close to 0 and the vast majority of pixels show a near-zero difference (defined here as |Δ| ≥ ε with ε = 10^–5 to avoid floating-point errors) in all comparisons. E-OBS shows the most significant differences, likely explained by differences in the version used as in the previous section.

These observations are confirmed by the overall distributions (Figures 7.3.10–7.3.13) and the geospatial distributions (Figures 7.3.14–7.3.17). Notably, while the difference between tx35 in the C3S Atlas dataset and E-OBS is typically small, with a median of 0 and with ≤4% of pixels showing non-zero differences, it has a long tail of large differences (Figure 7.3.11). These pixels are concentrated at the periphery of the domain (Figures 7.3.15). This is likely the result of a difference in the underlying dataset version (as before).

We can extend the conclusion from the previous notebooks, namely that the C3S Atlas dataset can be considered practically reproducible. Some indicators in the C3S Atlas dataset are completely identical to their origin datasets, e.g. r01 in the CPC comparison; others show non-zero differences, e.g. some of the pixels in the CMIP6 and E-OBS tx35 comparisons. The causes of these differences are unclear, but they are generally small and rare enough to be negligible for most users using the C3S Atlas dataset, especially since they will typically use it through the application. For further analysis, it is generally best to manually process the origin dataset, if only to remove the amount of steps that may affect the result.

Projected tx35

../_images/3243bc603152607be2d654e0e77b4b72535cba8fc2370d203480ef29f57bd7b4.png — Fig. 7.3.10 Comparison between overall distributions of projected `tx35` values in the C3S Atlas dataset and its reproduction on the C3S Atlas grid, across all spatial and temporal dimensions, including the per-pixel difference.#

Historical tx35

../_images/f7749ca374f3531ba4cf3e791891eab5ce63c4eedb432f1f5a089bbe3a51a49f.png — Fig. 7.3.11 Comparison between overall distributions of historical `tx35` values in the C3S Atlas dataset and its reproduction on the C3S Atlas grid, across all spatial and temporal dimensions, including the per-pixel difference.#

sst

../_images/2bf3bbfc79c4e3bef360d0ec84935d4a695046c3a7c1afa6715b0f1cc2340cdd.png — Fig. 7.3.12 Comparison between overall distributions of historical `sst` values in the C3S Atlas dataset and its reproduction on the C3S Atlas grid, across all spatial and temporal dimensions, including the per-pixel difference.#

r01

../_images/7ee5244c2af4db7b20a4410350189fba48aee08c2534da1dd0d9e9c60e212984.png — Fig. 7.3.13 Comparison between overall distributions of historical `r01` values in the C3S Atlas dataset and its reproduction on the C3S Atlas grid, across all spatial and temporal dimensions, including the per-pixel difference.#

Projected tx35

../_images/edc9df803afc1bd69a25dc06d345f31f939a7cd5392e5b68ff715c08604d7b46.png — Fig. 7.3.14 Comparison between C3S Atlas dataset and reproduction for projected `tx35` in one month, across Europe, on the C3S Atlas dataset grid, including the per-pixel difference.#

Historical tx35

../_images/9d96192e46d9635ad57876b18f5335cc63a451b40f092dde2c0fd7b00bf4b0f5.png — Fig. 7.3.15 Comparison between C3S Atlas dataset and reproduction for historical `tx35` in one month, across Europe, on the C3S Atlas dataset grid, including the per-pixel difference.#

sst

../_images/462c80cf58b6e459539f4f9ddaf5f620d33a600ec1d1ca5a80195c543bc929e1.png — Fig. 7.3.16 Comparison between C3S Atlas dataset and reproduction for historical `sst` in one month, across Europe, on the C3S Atlas dataset grid, including the per-pixel difference.#

r01

../_images/6537d9c4ac9ffee5cc6ef0b7ae8d2ebf894b1a5def563b4f469137023c8af15a.png — Fig. 7.3.17 Comparison between C3S Atlas dataset and reproduction for historical `r01` in one month, across Europe, on the C3S Atlas dataset grid, including the per-pixel difference.#

ℹ️ If you want to know more#

Key resources#

The CDS catalogue entries for the data used were:

Gridded dataset underpinning the Copernicus Interactive Climate Atlas: multi-origin-c3s-atlas
- Consistency between the C3S Atlas dataset and its origins: Case study
- Consistency between the C3S Atlas dataset and its origins: Multiple indicators
- Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets
CMIP6 climate projections: projections-cmip6
- Quality assessments for CMIP6
CMIP5 daily data on single levels: projections-cmip5-daily-single-levels
CORDEX regional climate model data on single levels: projections-cordex-domains-single-levels
- Quality assessments for CORDEX
ERA5 hourly data on single levels from 1940 to present: reanalysis-era5-single-levels
- Quality assessments for Reanalyses
ERA5-Land hourly data from 1950 to present: reanalysis-era5-land
- Quality assessments for Reanalyses
E-OBS daily gridded meteorological data for Europe from 1950 to present derived from in-situ observations: insitu-gridded-observations-europe
- Quality assessments for Insitu Observations
Temperature and precipitation gridded data for global and regional domains derived from in-situ and satellite observations (BERKEARTH, CPC) insitu-gridded-observations-global-and-regional
- Quality assessments for Insitu Observations
ORAS5 global ocean reanalysis monthly data from 1958 to present: reanalysis-oras5
- Quality assessments for Reanalyses
Sea surface temperature daily data from 1981 to present derived from satellite observations (SST-CCI): satellite-sea-surface-temperature
- Quality assessments for Satellite Observations: Ocean

Code libraries used:

earthkit
- earthkit-data
- earthkit-plots
User-tools for the C3S Atlas
xclim climate indicator tools

More about the Copernicus Interactive Climate Atlas and its IPCC predecessor:

References#

[Gutiérrez+24] J. M. Gutiérrez et al., ‘The Copernicus Interactive Climate Atlas: a tool to explore regional climate change’, ECMWF Newsletter, vol. 181, pp. 38–45, Oct. 2024, doi: 10.21957/ah52ufc369.

[C3S Atlas dataset] Copernicus Climate Change Service, ‘Gridded dataset underpinning the Copernicus Interactive Climate Atlas’. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), Jun. 17, 2024. doi: 10.24381/cds.h35hb680.

[CMIP6 dataset] Copernicus Climate Change Service, ‘CMIP6 climate projections’. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), Mar. 23, 2021. doi: 10.24381/cds.c866074c.

[Burggraaff+20] O. Burggraaff, ‘Biases from incorrect reflectance convolution’, Optics Express, vol. 28, no. 9, pp. 13801–13816, Apr. 2020, doi: 10.1364/OE.391470.

Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets

Contents

7.3. Consistency between the C3S Atlas dataset and its origins: Multiple origin datasets#

🌍 Use case: Retrieving climate indicators from the Copernicus Interactive Climate Atlas#

❓ Quality assessment question#

📢 Quality assessment statement#

📋 Methodology#

📈 Analysis and results#

1. Code setup#

Install the User-tools for the C3S Atlas#

Import required libraries#

Define indicators#

Helper functions#

Data downloading & (pre-)processing#

Statistics#

Visualisation#

2. Calculate and retrieve indicators#

General setup#

CMIP6#

CMIP5#

CORDEX-EUR-11#

ERA5#

ERA5-Land#

E-OBS#

BERKEARTH#

ORAS5#

SST-CCI#

CPC#

Cleanup#

3. Results#

Consistency: Comparison on native grids#

Reproducibility: Comparison on C3S Atlas grid#

ℹ️ If you want to know more#

Key resources#

References#