
C3S EQC quality assessments#
This Jupyter Book contains a collection of quality assessments of C3S data and applications produced by external evaluators under the C3S2_520 and C3S2_521 contracts, which provide an evaluation and quality control (EQC) function for selected datasets on the climate data store (CDS) and C3S applications.
C3S has established an EQC framework for all its products and services to ensure that users are served well and that this will continue to be the case as their needs evolve. The main goal of the EQC for CDS datasets function is to develop precise statements about data quality that pertain to well-identified use cases. Those statements, in combination with other documented information about the datasets, constitute a knowledge base that can help users to assess fitness for purpose, given their needs and requirements.
Layers of evaluation and quality control#
The revised EQC framework makes a distinction between quality assurance, quality assessment and fitness for purpose.
data quality checks against key requirements
scientific assessments which answer user questions
deciding whether data is suitable for a particular use
In more detail, Quality assurance serves to inform users that data, metadata and documentation comply with a well-defined set of verifiable technical requirements. It provides evidence that this compliance has been checked independently from the producers. Quality assurance for each CDS dataset is implemented by verifying a set of well-defined technical requirements associated with the dataset. Specific assessment criteria for the respective data streams (e.g. reanalysis, projections) are developed to account for the different nature of these and the different types of checks that may have to be applied to verify the requirement. The role of EQC evaluators is to check these criteria and document the outcome to users.
The purpose of quality assessments, on the other hand, is to provide science-based information about accuracy, uncertainties, sources of uncertainty, temporal consistency, strengths, and weaknesses of a dataset in the context of a realistic use cases.
EQC evaluators are tasked with developing quality assessments that are designed to generate useful statements about fitness for purpose of CDS datasets. The assessments address concrete questions about data quality associated with real use cases. Many of the assessments are implemented in Jupyter notebooks that can be shared, re-used, and modified by users. Assessments can involve multiple CDS datasets and use other sources of reference data. Assessments must build on relevant scientific literature, including published documents developed by producers and users of the datasets.
Taken together, the outcomes of these activities provide the key information needed to determine fitness for purpose. This information supports users in determining whether the data is fit for their specific application.
Quality assessments#
Each quality assessment is comprised of
Use Case: Description of an application of a user from the user perspective.
User Question: Question from a user regarding quality attributes of a dataset in the context of their application.
Quality Assessment: Scientific evaluation of quality-related user questions including guidance on suitability for a specific use (“How, and how well can I use the data for my purpose?”, “Is the dataset adequate to address my problem?”, “What is the most suitable C3S dataset to tackle my problem?”)
These assessments are organised by the type of data they address. Note that while each assessment focusses on addressing the quality of a particular dataset, it may rely on other types of data as well (e.g. reanalysis for bias assessment).
Running the Notebooks#
Most of these quality assessments include Python code to produce data and figures which help answer the user question. This code is included for transparency and traceability, as the software was primarily designed to support evaluators running the assessments. However, many notebooks (or sections thereof) are directly reproducible and can be adapted as a jumping-off point for your own use case. The notebooks (.ipynb files) can be downloaded and run on freely available cloud platforms or on your own computing resources.
Most code in the notebooks is self-contained, although some quality assessments may rely on the outcomes and code of previous assessments, or offline computations. Many notebooks make use of the custom software prepared by B-Open for EQC evaluators.
The notebooks increasingly incorporate earthkit, ECMWF’s new open-source tools for weather and climate science workflows. Earthkit includes components for data access, processing, analysis, visualisation and much more, built on top of well-established open-source Python libraries like numpy, pandas and matplotlib.