European Air Quality Index Calculation

European Air Quality Index Calculation#

About#

This notebook provides you a practical introduction to the calculation and interpretation of the Air Quality Index (AQI) in Europe. Air pollution is the single largest environmental health risk in Europe, causing cardiovascular and respiratory diseases, that, in the most serious cases, lead to premature deaths. The European Environment Agency’s European Air Quality Index allows users to understand more about air quality where they live. Even though the air quality in Europe has improved over recent decades, the levels of air pollutants still exceed EU standards and the most stringent World Health Organization (WHO) guidelines.

How is the European Air Quality index defined?#

The European Air Quality index is computed for five main pollutants regulated in the European legislation:

Ozone (O3)
Nitrogen Dioxide (NO2)
Sulphur Dioxide (SO2)
Fine particulate matter with a diameter smaller than 2.5 micrometers (PM2.5)
Fine particulate matter with a diameter smaller than 10 micrometers (PM10)

The index ranges from 1 (good) to 6 (extremely poor). For each pollutant, the index is calculated separately according to the concentrations; the higher the concentrations, the higher the index (see Table below for index levels). The overall hourly European Air Quality index is simply defined as the highest value of the five individual pollutants indexes computed for the same hour. For instance, if the indices relative to O₃, NO₂, SO₂, PM2.5 and PM10 are 1,3,1,2,2 respectively, the overall index will be 3. The overall daily European Air Quality index is the highest value of the overall hourly European Air Quality index in the corresponding day.

logo

The notebook has the following outline:

1 - Compute the European Air Quality Index for Europe for one day in July 2021
- 1.1 - Request data from the ADS programmatically with the CDS API
- 1.2 - Load and prepare the European air quality forecast data
- 1.3 - Classify daily maximum pollutant values into European Air Quality Index levels
- 1.4 - Visualize a map of Air Quality Index levels in Europe for 15 July 2021
2 - Calculate the daily European Air Quality Index for London in December 2020
- 2.1 - Request data from the ADS programmatically with the CDS API
- 2.2 - Load and prepare the European air quality forecast data
- 2.3 - Select time-series information for London, UK and convert to a pandas dataframe
- 2.4 - Classify daily maximum values of key pollutants into European Air Quality Index levels
- 2.5 - Visualize daily European Air Quality Index for London in December 2020 as heatmap

Data#

This notebook introduces you to the CAMS European air quality forecasts and analysis dataset. The data has the following specifications:

Data: CAMS European air quality forecasts
Temporal coverage: three-year rolling archive
Temporal resolution: hourly
Spatial coverage: Europe
Spatial resolution: 0.1° x 0.1°
Format: NetCDF

Further resources#

Run the tutorial via free cloud platforms:

Install CDSAPI via pip#

!pip install cdsapi

Load libraries#

# CDS API
import cdsapi
import os

# Libraries for working with multi-dimensional arrays
import numpy as np
import xarray as xr
import pandas as pd

# Libraries for plotting and visualising data
import matplotlib.path as mpath
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.display import HTML
from matplotlib.colors import ListedColormap

from datetime import datetime

import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature
import seaborn as sns

2. Calculate the daily European Air Quality Index for London in December 2020#

Next, we want to calculate the daily European Air Quality Index for London, UK in December 2020.

2.1 Request data from the ADS programmatically with the CDS API#

We request another dataset with the help of the CDS API. Below, we request analysis data from the CAMS European air quality forecasts dataset. We request hourly data for December 2020 for the five main pollutants for which the European Air Quality Index is calculated: nitrogen_dioxide, ozone, particulate_matter_10um, particulate_matter_2.5um and sulphur_dioxide. In the data request below, we additionally set the area key, as we want to tailor the data request to only a small geographical area around London, United Kingdom.

Let us store the dataset under the name 202012_eaqi_london.nc.

dataset = "cams-europe-air-quality-forecasts"
request = {
    'variable': ['nitrogen_dioxide', 'ozone', 
                 'particulate_matter_2.5um', 
                 'particulate_matter_10um', 'sulphur_dioxide'],
    'model': ['ensemble'],
    'level': ['0'],
    'date': ['2020-12-01/2020-12-31'],
    'type': ['analysis'],
    'time': ['00:00', '01:00', '02:00', '03:00', '04:00', 
             '05:00', '06:00', '07:00', '08:00', '09:00', 
             '10:00', '11:00', '12:00', '13:00', '14:00', 
             '15:00', '16:00', '17:00', '18:00', '19:00', 
             '20:00', '21:00', '22:00', '23:00'],
    'leadtime_hour': ['0'],
    'data_format': 'netcdf',
    'area': [51.6, -0.3, 51.3, 0]
}

client = cdsapi.Client(url=URL, key=KEY)
client.retrieve(dataset, request).download(
    './202012_eaqi_london.nc')

2024-09-12 22:36:05,020 INFO Request ID is dd76ceb8-f4c9-4930-9540-af933b6f0140
2024-09-12 22:36:05,092 INFO status has been updated to accepted
2024-09-12 22:36:06,677 INFO status has been updated to running
2024-09-12 22:42:23,523 INFO status has been updated to successful

'./202012_eaqi_london.nc'

2.3 Select time-series information for London, UK and convert to a pandas dataframe#

The xarray.Dataset above is now in a format that allows us to select the values of the pollutants for one single grid point. Let us first define variables for the latitude and longitude coordinates for London, United Kingdom. These two variables can then be used to select the values for one single grid point. You can use xarray’s function sel() to make a coordinate-based selection. Let us set the keyword argument method='nearest', which selects the information that belongs to the closest grid point to the provided latitude and longitude inputs.

The resulting dataset remains with time as the only dimension.

Our aim is to compute the daily European Air Quality Index for London for every day in December 2020. Hence, as a next step we want to resample the hourly data to a daily resolution. As resample function, we use max(), which selects for every key pollutant the maximum value per day. You can use the function resample() to aggregated the hourly information to daily maximum values. The result is a decrease of the time dimension entries from 744 to 31, one entry for every day in December 2020.

Above, the xarray.Dataset holds the daily time-series information for London for the five main pollutants. The Python library pandas is very effective in handling time-series data and offers an interface to xarray. Let us convert the xarray.Dataset above to a pandas.DataFrame with the function to_dataframe(). The result is a dataframe with 31 row entries and seven columns, including longitude and latitude information as well as the maximum daily values for each main pollutant.

eaqi_ts_daily_df = eaqi_lnd_ts_daily.to_dataframe()
eaqi_ts_daily_df

	longitude	latitude	no2_conc	o3_conc	pm10_conc	pm2p5_conc	so2_conc
time
2020-12-01	-0.049988	51.549999	40.360981	51.209736	22.570604	13.902804	6.562004
2020-12-02	-0.049988	51.549999	56.892639	15.424085	45.809910	31.491850	21.718649
2020-12-03	-0.049988	51.549999	68.645248	46.312866	26.582747	17.763401	13.441649
2020-12-04	-0.049988	51.549999	54.392933	45.893066	15.623908	12.668585	6.704823
2020-12-05	-0.049988	51.549999	45.801853	49.884430	17.493584	13.535539	6.376126
2020-12-06	-0.049988	51.549999	41.127209	35.249939	38.035538	28.093002	5.623093
2020-12-07	-0.049988	51.549999	54.871750	18.775467	36.344421	36.085228	10.280019
2020-12-08	-0.049988	51.549999	56.589455	9.765294	43.291382	34.626175	18.309492
2020-12-09	-0.049988	51.549999	60.274017	31.233099	30.912859	15.796569	8.295956
2020-12-10	-0.049988	51.549999	34.791924	37.745247	19.204161	13.562931	4.783942
2020-12-11	-0.049988	51.549999	71.287163	48.790863	38.415810	23.538435	9.859519
2020-12-12	-0.049988	51.549999	51.162815	39.226406	31.483122	22.733740	7.092950
2020-12-13	-0.049988	51.549999	40.099022	55.796829	19.655128	13.813503	5.350842
2020-12-14	-0.049988	51.549999	34.202461	72.778191	22.917818	7.765835	4.940461
2020-12-15	-0.049988	51.549999	41.867252	57.196274	17.836281	9.269776	4.227675
2020-12-16	-0.049988	51.549999	49.033611	55.022697	14.541663	9.695584	5.604219
2020-12-17	-0.049988	51.549999	39.638039	52.591488	19.744169	9.260588	5.351467
2020-12-18	-0.049988	51.549999	16.793612	55.203526	13.695438	7.943083	3.367660
2020-12-19	-0.049988	51.549999	17.968653	66.484818	17.481939	7.549853	2.891042
2020-12-20	-0.049988	51.549999	30.277277	63.118961	18.281832	8.503333	5.054744
2020-12-21	-0.049988	51.549999	21.749844	52.183567	15.184654	7.387372	3.196917
2020-12-22	-0.049988	51.549999	53.465370	60.969646	22.707979	16.764242	8.583075
2020-12-23	-0.049988	51.549999	41.588886	54.508396	16.114367	11.392229	7.723714
2020-12-24	-0.049988	51.549999	18.644598	67.836159	12.245522	5.823782	3.355804
2020-12-25	-0.049988	51.549999	44.523937	64.962776	20.359892	15.102349	8.454208
2020-12-26	-0.049988	51.549999	30.182362	70.119171	17.390364	11.580196	4.097984
2020-12-27	-0.049988	51.549999	53.823380	78.637825	25.374117	13.778440	13.618376
2020-12-28	-0.049988	51.549999	57.219772	36.210609	34.993145	25.498072	17.036882
2020-12-29	-0.049988	51.549999	55.005043	26.502134	40.520016	30.452618	10.972260
2020-12-30	-0.049988	51.549999	64.272896	32.193192	35.142998	24.304281	23.899206
2020-12-31	-0.049988	51.549999	44.753349	29.064219	33.245438	24.219048	11.004436

2.4 Classify daily maximum values of key pollutants into European Air Quality Index levels#

The next step is now to classify the daily maximum values into the respective European Air Quality Index levels. The EAQI has six index levels and different thresholds for each of the five key pollutants. First, we define the limits for each pollutant and define additionally a list with the index class labels 1 to 6.

ozone_limits = [0, 50, 100, 130, 240, 380, 800]
no2_limits = [0, 40, 90, 120, 230, 340, 1000]
so2_limits = [0, 100, 200, 350, 500, 750, 1250]
pm25_limits = [0, 20, 40, 50, 100, 150, 1200]
pm10_limits = [0, 10, 20, 25, 50, 75, 800]

index_levels = [1, 2, 3, 4, 5, 6]

Based on the thresholds above, we now classify the daily maximum values of each pollutant into the respective European Air Quality Index level. The pandas function cut() allows us to categorize the daily maximum values into index levels. The function takes the following keyword arguments:

pandas.Series for each pollutant, which shall be categorized
list of thresholds for each pollutant
list with index labels

As a result, for each pollutant, the daily maximum value has been classified into one of the six index level classes, based on the pollutant thresholds.

ozone = pd.cut(eaqi_ts_daily_df['o3_conc'], ozone_limits , labels = index_levels)
no2 = pd.cut(eaqi_ts_daily_df['no2_conc'], no2_limits , labels = index_levels)
so2 = pd.cut(eaqi_ts_daily_df['so2_conc'], so2_limits , labels = index_levels)
pm25 = pd.cut(eaqi_ts_daily_df['pm2p5_conc'], pm25_limits , labels = index_levels)
pm10 = pd.cut(eaqi_ts_daily_df['pm10_conc'], pm10_limits , labels = index_levels)

In a next step, we want to bring together the categorized pandas.Series objects above into one pandas.DataFrame. For this, we can use the constructor pd.DataFrame() and combine the five pandas.Series objects into one DataFrame.

df = pd.DataFrame(dict(o3 = ozone, no2 = no2, so2 = so2, pm25=pm25, pm10=pm10))

The last step before we can visualize the European Air Quality Index levels is to compute the overall index level based on the five pollutants. The overall index level for each day is defined by the pollutant with the highest level. Below, with the help of the function max(), we select the maximum index level for each day and define a new column with the name level.

df['level'] = df.max(axis=1).astype(int)
df

	o3	no2	so2	pm25	pm10	level
time
2020-12-01	2	2	1	1	3	3
2020-12-02	1	2	1	2	4	4
2020-12-03	1	2	1	1	4	4
2020-12-04	1	2	1	1	2	2
2020-12-05	1	2	1	1	2	2
2020-12-06	1	2	1	2	4	4
2020-12-07	1	2	1	2	4	4
2020-12-08	1	2	1	2	4	4
2020-12-09	1	2	1	1	4	4
2020-12-10	1	1	1	1	2	2
2020-12-11	1	2	1	2	4	4
2020-12-12	1	2	1	2	4	4
2020-12-13	2	2	1	1	2	2
2020-12-14	2	1	1	1	3	3
2020-12-15	2	2	1	1	2	2
2020-12-16	2	2	1	1	2	2
2020-12-17	2	1	1	1	2	2
2020-12-18	2	1	1	1	2	2
2020-12-19	2	1	1	1	2	2
2020-12-20	2	1	1	1	2	2
2020-12-21	2	1	1	1	2	2
2020-12-22	2	2	1	1	3	3
2020-12-23	2	2	1	1	2	2
2020-12-24	2	1	1	1	2	2
2020-12-25	2	2	1	1	3	3
2020-12-26	2	1	1	1	2	2
2020-12-27	2	2	1	1	4	4
2020-12-28	1	2	1	2	4	4
2020-12-29	1	2	1	2	4	4
2020-12-30	1	2	1	2	4	4
2020-12-31	1	2	1	2	4	4

2.5 Visualize daily European Air Quality Index for London in December 2020 as heatmap#

The last step is now to visualize the daily European Air Quality Index for London in December 2020 as heatmap. We can use the function heatmap() from the seaborn library to create a heatmap with days on the horizontal axis and the pollutants as well as the overall index level on the vertical axis. This representation allows for a quick interpretation on the overall index level for every day. It further allows you to identify the determining pollutant for every day.

For the heatmap() function, we need to transpose the dataframe, so that the key pollutants and overall index level are given as row entries and for every day in December 2020 a column entry. Additionally, with iloc[::-1], we want to revert the sequence of the pollutants, as we want to visualize the overall index level on top.

df = (df.T).iloc[::-1]
df

time	2020-12-01	2020-12-02	2020-12-03	2020-12-04	2020-12-05	2020-12-06	2020-12-07	2020-12-08	2020-12-09	2020-12-10	...	2020-12-22	2020-12-23	2020-12-24	2020-12-25	2020-12-26	2020-12-27	2020-12-28	2020-12-29	2020-12-30	2020-12-31
level	3	4	4	2	2	4	4	4	4	2	...	3	2	2	3	2	4	4	4	4	4
pm10	3	4	4	2	2	4	4	4	4	2	...	3	2	2	3	2	4	4	4	4	4
pm25	1	2	1	1	1	2	2	2	1	1	...	1	1	1	1	1	1	2	2	2	2
so2	1	1	1	1	1	1	1	1	1	1	...	1	1	1	1	1	1	1	1	1	1
no2	2	2	2	2	2	2	2	2	2	1	...	2	2	1	2	1	2	2	2	2	2
o3	2	1	1	1	1	1	1	1	1	1	...	2	2	2	2	2	2	1	1	1	1

6 rows × 31 columns

The transpose function converts NaN values to a large negative number. For this reason, we have to set negative entries back to NaN (not a number).

df[df<0] = np.nan
df

time	2020-12-01	2020-12-02	2020-12-03	2020-12-04	2020-12-05	2020-12-06	2020-12-07	2020-12-08	2020-12-09	2020-12-10	...	2020-12-22	2020-12-23	2020-12-24	2020-12-25	2020-12-26	2020-12-27	2020-12-28	2020-12-29	2020-12-30	2020-12-31
level	3	4	4	2	2	4	4	4	4	2	...	3	2	2	3	2	4	4	4	4	4
pm10	3	4	4	2	2	4	4	4	4	2	...	3	2	2	3	2	4	4	4	4	4
pm25	1	2	1	1	1	2	2	2	1	1	...	1	1	1	1	1	1	2	2	2	2
so2	1	1	1	1	1	1	1	1	1	1	...	1	1	1	1	1	1	1	1	1	1
no2	2	2	2	2	2	2	2	2	2	1	...	2	2	1	2	1	2	2	2	2	2
o3	2	1	1	1	1	1	1	1	1	1	...	2	2	2	2	2	2	1	1	1	1

6 rows × 31 columns

And now, we can finally visualize the heatmap. The code below has five main parts:

1. Initiate the matplotlib figure: Initiate a figure and axes objects and add a colorbar
2. Plotting function: Plot the pandas dataframe with the heatmap() function from the seaborn library
3. Set title of the heatmap: Set a title to the resulting heatmap
4. Customize x- and y-axis ticks and labels: Customize the x- and y-labels for correct representation of the data
5. Customize colorbar entries: Customize the labels of the colorbar and indicate the index levels from ‘very good’ to ‘extremely poor’

Note: before plotting, we additionally define a customized colormap based on the official EAQI index colors. We also add additionally the labels to the six EAQI index levels, which range from ‘Very good’ to ‘Extremely poor’.

cmap = ListedColormap(['#5AAA5F', '#A7D25C', '#ECD347', '#F5BE41', '#F09235', '#D93322'])
cmap

labels = ['Very good', 'Good', 'Medium', 'Poor', 'Very Poor', 'Extremely Poor']

# Initiate the matplotlib figure
fig, ax = plt.subplots(1,1,figsize=(35,5))
cbar_ax = fig.add_axes([.82, .13, .01, .75])

# Plotting function
g = sns.heatmap(df, cmap=cmap, linewidth=1, linecolor='w', square=True, cbar_ax = cbar_ax,vmin=1, vmax=6, ax=ax)

# Set title of the heatmap
g.set_title("\nDaily European Air Quality Index levels for London - December 2020\n", fontsize=20)
g.set(xlabel=None)

# Customize x- and y-axis ticks and labels
ylabels=['Index level', 'PM 10', 'PM 2.5', 'Sulphur dioxide', 'Nitrogen dioxide', 'Ozone']
g.set_yticklabels(ylabels,fontsize=14, rotation=0)

xlabels=df.columns.format('%Y-%m-%d')[1::]
g.set_xticklabels(xlabels, fontsize=14)

# Customize colorbar entries
cbar = ax.collections[0].colorbar
cbar.set_ticks([1.4,2.2,3.1,3.9,4.8,5.6])
cbar.set_ticklabels(['Very good', 'Good', 'Medium', 'Poor', 'Very Poor', 'Extremely Poor'])
cbar.ax.tick_params(labelsize=14)

_images/6b5cbc04d19376785582b432b6391dac5518c9b392b45112ad3902a8ec971c44.png

This project is licensed under APACHE License 2.0. | View on GitHub

European Air Quality Index Calculation

Contents

European Air Quality Index Calculation#

About#

How is the European Air Quality index defined?#

Data#

Further resources#

Install CDSAPI via pip#

Load libraries#

1. Compute the European Air Quality Index for Europe for one day in July 2021#

1.1 Request data from the ADS programmatically with the CDS API#

1.2 Load and prepare the European air quality forecast data#

1.3 Classify daily maximum concentration values into European Air Quality Index levels#

1.4 Visualize a map of Air Quality Index levels in Europe for 15 July 2021#

2. Calculate the daily European Air Quality Index for London in December 2020#

2.1 Request data from the ADS programmatically with the CDS API#

2.2 Load and prepare the European air quality forecast data#

2.3 Select time-series information for London, UK and convert to a pandas dataframe#

2.4 Classify daily maximum values of key pollutants into European Air Quality Index levels#

2.5 Visualize daily European Air Quality Index for London in December 2020 as heatmap#