
2.1. Reference Upper-Air Network for climate monitoring#
Production date: 01-07-2023
Produced by: V. Ciardini (ENEA), P. Grigioni (ENEA), G. Pace (ENEA) and C. Scarchilli (ENEA)
🌍 Use case: monitoring the effects of climate change on the tropopause characteristics#
❓ Quality assessment question#
- How well can we detect the seasonal and latitudinal variability of the tropopause in the GRUAN measurements? 
This Notebook provides a practical introduction to the GRUAN dataset available on the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S) (https://cds.climate.copernicus.eu/datasets/insitu-observations-gruan-reference-network?tab=overview). The use case presented here is addressed to calculate the tropopause altitude at three different GCOS (Global Climate Observing System) Reference Upper Air Network (GRUAN) stations highlighting differences related to its seasonal and latitudinal characteristics.
📢 Quality assessment statement#
These are the key outcomes of this assessment
- GRUAN provides reference measurements and uncertainties and is suitable to analyse in depth tropopause characteristics; some possible limitations may arise due to the number of stations available and their spatial distribution (e.g. in the Southern hemisphere) and, in some cases (e.g. in trend analysis), to the still limited temporal coverage, depending on the entry in GRUAN by each station. However, the number of launches per station can vary from a minimum of 2-3 launches per week to 2-4 launches per day in most cases. 
- The first LRT has the typical behaviour with higher (lower) values at lower (higher) latitudes. The amplitude of the LRT is more pronounced at lower latitudes. 
- The mean annual cycle of the LRT is more pronounced in TEN with amplitude exceeding 2 km, while at the polar station, it seems to show a two-waves pattern, with maxima in summer and in winter, and minima in spring and autumn. A cycle less pronounced characterised the LIN station. 
- By removing seasonality from LRT monthly mean time series of LIN and NYA and applying a linear least-squares regression, an increasing trend is revealed. Presented results are in accordance with previous findings but differences are revealed in terms of yearly trends, both due to the higher quality of GRUAN upper-air data but also to the different length of data records. 
📋 Methodology#
Tropopause is defined as the upper limit of the troposphere, the lower region of the atmosphere where the principal weather phenomena occur, and the stratosphere, the overlying atmospheric region characterised by high dynamic stability. It is an important boundary for studies in atmospheric sciences and it is considered as an important indicator of anthropogenic climate change [1]. Different definitions of tropopause are available, based on thermal, dynamical, and chemical characteristics of the atmosphere. In the following, the definition promulgated by the World Meteorological Organization, WMO in 1957, based on temperature lapse rate (LR), is applied [2] because of its widespread use. Although some authors highlight situations where the Lapse-Rate Tropopause (LRT) definition fails to reliably identify the tropopause altitude particularly in the polar regions [3]. The WMO definition also allows for the identification of more than one tropopause above the primary layer; in the following, we referred to the lower tropopause detected by applying the method [2] to an atmospheric profile (first LRT). GRUAN provides highly accurate measurements of the atmospheric profile especially for the upper troposphere and the lower stratosphere [4]; such high accuracy is appropriate for understanding and monitoring tropopause characteristics. Three GRUAN station located at different latitudes have been selected among those with the longest data record between 2006-2020. In the following section, the calculation of the LRT and the comparison among the three time series are presented highlighting differences in the tropopause annual behavior at different latitudes. Also, we assess the agreements with previous studies of the LRT estimated using only observational measurements [e.g. [5], [6]].
The GCOS Reference Upper Air Network (GRUAN)#
Envisioned by the WMO in the 2007, GRUAN is a reference network that aims to establish 30-40 observing stations to, among other things, maintain measurements over several decades to accurately quantify trends, to fully characterize the properties of the atmospheric column, and to further our understanding of climate variability and change. Key points for reference quality are the traceability to the SI of the calibration and the analysis of measurement uncertainties. GRUAN responds to strict requirements such as measurement accuracy and long-term stability upon observing systems; all contribution of the measurement uncertainties are determined and documented, and radiosonde data are provided with vertically resolved uncertainties (random and systematic). GRUAN uncertainty estimates are 0.15 K for night-time temperature measurements and approximately 0.6 K at 25 km during daytime [7]. One of the Scientific Imperatives of GRUAN is “Understanding and monitoring tropopause characteristics” [4]. Currently, data from 18 stations are available on the CDS for the period 2006-2020 are available (depending on the station radiosounding frequency). An update is expected by the end of 2024.
The analysis and results are organised in the following steps, which are detailed in the sections below:
- To investigate tropopause characteristics, the interannual variability, the mean annual cycle and the trend estimation of the LRT were carried out. 
📈 Analysis and results#
1. Setup and retrieval#
The User will be working with data in CSV format. To handle this data effectively, libraries for working with multidimensional arrays, particularly Xarray, will be utilised. The User will also require libraries for visualising data, in this case Matplotlib, and for statistical analysis, Scipy.
Code:
- Import CDSAPI credential and packages; 
- define parameters (time period, stations, downloaded file directory); 
- define functions to cache (lapse-rate tropopause); 
- download data and compute tropopause. 
Show code cell source
import calendar
import os
import pathlib
import cdsapi
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import scipy.stats
import statsmodels.tsa.seasonal
import xarray as xr
from c3s_eqc_automatic_quality_control import download
from scipy import stats
plt.style.use("seaborn-v0_8-notebook")
Time range of data series can be set by the User; different stations can be selected by means of Station acronyms (list of GRUAN station acronyms, locations, geographical coordinates and WMO n. are reported in the Product User Guide and Specification for GRUAN Temperature, Relative Humidity and Wind profiles available at [9]).
Define Parameters#
Show code cell source
# Time period
start = "2006-05"
stop = "2020-03"
# Stations
stations = ["TEN", "LIN", "NYA"]  # Use None to analyse all stations
assert isinstance(stations, list | None)
# Directory for csv files
csv_dir = "./csv_files"
# CDS credentials
os.environ["CDSAPI_RC"] = os.path.expanduser("~/ciardini_virginia/.cdsapirc")
Define request#
Show code cell source
collection_id = "insitu-observations-gruan-reference-network"
request = {
    "variable": ["air_temperature", "air_pressure", "altitude"],
    "format": "netcdf",
}
client = cdsapi.Client()
requests = []
for date in pd.date_range(start, stop, freq="1MS"):
    time_request = {"year": date.strftime("%Y"), "month": date.strftime("%m")}
    time_request["day"] = client.client.apply_constraints(
        collection_id, request | time_request
    )["day"]
    if time_request["day"]:
        requests.append(request | time_request)
2025-03-26 08:22:09,952 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-26 08:22:09,953 WARNING [2024-06-16T00:00:00] CDS API syntax is changed and some keys or parameter names may have also changed. To avoid requests failing, please use the "Show API request code" tool on the dataset Download Form to check you are using the correct syntax for your API request.
Functions to cache#
Show code cell source
def _reorganize_dataset(ds):
    # Rename
    (varname,) = set(ds["observed_variable"].values)
    ds = ds.rename(observation_value=str(varname)).drop_vars("observed_variable")
    ds = ds.rename(
        {
            var: "_".join([varname, var.replace("_value", "")])
            for var in ds.data_vars
            if var.startswith("uncertainty")
        }
    )
    # Update attrs
    for var, da in ds.data_vars.items():
        match var:
            case "pressure":
                da.attrs["standard_name"] = "Pressure"
            case "air_temperature":
                da.attrs["standard_name"] = "Temperature"
            case "altitude":
                da.attrs["standard_name"] = "Altitude"
            case "relative_humidity":
                da.attrs["standard_name"] = "Relative"
        for string in ("units", "type"):
            if string in var:
                ds = ds.drop_vars(var)
                (value,) = set(da.values)
                attrs_var = varname if var == string else var.replace("_" + string, "")
                ds[attrs_var].attrs[string] = value
    return ds
def reorganize_dataset(ds, stations):
    for var, da in ds.data_vars.items():
        if np.issubdtype(da.dtype, np.bytes_):
            ds[var].values = np.char.decode(da.values, "utf-8")
    if stations is not None:
        ds = ds.where(ds["primary_station_id"].isin(stations), drop=True)
    if not ds.sizes["index"]:
        return ds
    datasets = []
    for var, ds in ds.groupby("observed_variable"):
        datasets.append(_reorganize_dataset(ds))
    ds = xr.merge(datasets)
    return ds
def calculate_tropopause(ds):
    attrs = {"long_name": "WMO Lapse-Rate Tropopause", "units": "km"}
    # convert, drop, and sort
    ds = ds.swap_dims(index="altitude")[["air_temperature", "altitude", "pressure"]]
    ds = ds.where(
        (ds["altitude"].notnull() & ds["pressure"].notnull()).compute(), drop=True
    )
    ds["altitude"] = ds["altitude"] * 1.0e-3
    ds["pressure"] = np.log10(ds["pressure"] * 1.0e-2)
    ds = (
        ds.drop_duplicates("altitude")
        .swap_dims(altitude="pressure")
        .drop_duplicates("pressure")
        .swap_dims(pressure="altitude")
        .sortby("altitude")
    )
    if not ds.sizes["altitude"] or (ds["altitude"][-1] - ds["altitude"][0]) < 2:
        # Column must be at least 2km
        return xr.DataArray(None, attrs=attrs)
    # interpolate
    interp_altitude = np.arange(0.1, 40.1, 0.1)
    temp = ds["air_temperature"].interp(altitude=interp_altitude, method="cubic")
    temp = temp.assign_coords(pressure=10 ** ds["pressure"].interp_like(temp))
    temp = temp.dropna("altitude")
    if not temp.sizes["altitude"]:
        return xr.DataArray(None, attrs=attrs)
    # compute lapse rate
    diff_kwargs = {"dim": "altitude", "label": "lower"}
    lapse_rate = -temp.diff(**diff_kwargs) / temp["altitude"].diff(**diff_kwargs)
    lapse_rate = lapse_rate.sel(altitude=slice(None, lapse_rate["altitude"].max() - 2))
    # mask and loop over valid lapse rates
    mask = (lapse_rate <= 2) & (lapse_rate["pressure"] <= 500)
    valid_altitude = lapse_rate["altitude"].where(mask.compute(), drop=True)
    for bottom in valid_altitude:
        temp_bottom = temp.sel(altitude=bottom)
        temp_above = temp.sel(altitude=slice(bottom, bottom + 2)).drop_sel(
            altitude=bottom
        )
        lapse_rate = (temp_bottom - temp_above) / (
            temp_above["altitude"] - temp_bottom["altitude"]
        )
        if (lapse_rate <= 2).all():
            return xr.DataArray(float(bottom.values), attrs=attrs)
    return xr.DataArray(None, attrs=attrs)
def compute_tropopause_altitude(ds, stations):
    ds = reorganize_dataset(ds, stations).compute()
    if not ds.sizes["index"]:
        ds = ds.swap_dims(index="time")
        da = xr.DataArray(
            [],
            dims=("time"),
            coords={
                "primary_station_id": ds["primary_station_id"],
                "time": pd.to_datetime(ds["report_timestamp"]).tz_localize(None),
            },
        )
        return da.to_dataset(name="tropopause")
    dataarrays = []
    for report_id, ds_id in ds.groupby(ds["report_id"]):
        da = calculate_tropopause(ds_id)
        time = pd.to_datetime(list(set(ds_id["report_timestamp"].values))).tz_localize(
            None
        )
        station = list(set(ds_id["primary_station_id"].values))
        da = da.expand_dims(time=time)
        da = da.assign_coords(primary_station_id=("time", station))
        dataarrays.append(da)
    da = xr.concat(dataarrays, "time")
    return da.sortby("time").to_dataset(name="tropopause")
Download and transform#
Show code cell source
ds = download.download_and_transform(
    collection_id,
    requests,
    chunks={"year": 1, "month": 1},
    transform_func=compute_tropopause_altitude,
    transform_func_kwargs={"stations": sorted(stations) if stations else stations},
    cached_open_mfdataset_kwargs={"concat_dim": "time", "combine": "nested"},
).compute()
csv_dir_path = pathlib.Path(csv_dir)
csv_dir_path.mkdir(exist_ok=True)
ds.to_pandas().to_csv(csv_dir_path / "ds_data.csv")
100%|██████████| 167/167 [00:10<00:00, 16.47it/s]
2. Results and discussion#
To investigate tropopause characteristics at three different latitudes, we considered and discussed as follow the interannual variability and the mean annual cycle. Furthermore by applying a classical decomposition [8] and assuming (reasonably) that the seasonal component repeats from year to year, the trend estimation of the LRT was carried out. The stations of Ny-Ålesund (NYA, 78.92°N,11.93°E), Lindemberg (LIN, 52.21°N, 14.12°E) and Tenerife (TEN, 28.32°N, 16.38°W) have been selected, taking into account station data availability [9], as representatives of the Northern Hemisphere polar, mid-latitude and sub-tropical stations. However, for the trend analysis, only LIN and NYA have been considered, with TEN excluded due to its gaps in data between mid-2010 and 2016.
Lapse Rate Tropopause time series (monthly values) and its annual cycle#
The inter-annual variability of the first LRT shows the typical behaviour with higher (lower) values at lower (higher) latitude. The amplitude of the LRT is more pronounced at lower latitude (TEN) where monthly mean values range between 16.7 km and 9.9 km, while for LIN and NYA, located at mid-latitude and in the Arctic region respectively, LRT varies from 12.4 km and 9.0 km for the former and from 11.0 km and 7.6 km for the latter (Figure 1). The annual cycle of mean LRT is calculated from data available on CDS between 2006 and 2020 for the three stations (Figure 2). Climatological statistics of the first LRT based on a long-term series of radiosoundings from the IGRA data archive, show that the height of the first tropopause decreases from 16.2 km near the equator to 8.5 km over the polar regions [10].The mean annual cycle is more pronounced in the subtropics (TEN, in green), following the typical cycle of the incoming radiation (maximum in summer and minimum in winter), with an amplitude that exceeds 2 km while in the Artic region (NYA, in orange) the mean annual cycle seems to show a two-waves pattern with maxima in summer and in winter and minima in spring and autumn. At mid-latitudes (LIN, in blue), the cycle is less pronounced probably due to the shift in phase from mid to high latitude [5]. Overall, the results for both the mid-latitude and polar stations are in agreement with those of [5], [11], where the latitudinal and longitudinal characteristics of the tropopause are discussed in detail. Nevertheless, as mentioned above, some authors point out the possible shortcomings of the LRT method in identifying the tropopause during winter and spring in the polar region, due to a stable, near-isothermal middle troposphere and UTLS (i.e. Upper Troposphere and Lower Stratosphere), which sometimes can prevent the LRT criteria from being met [3], [11]. On the other hand, [11] compares the thermal tropopause and the dynamical tropopause in the polar region finding that the differences, in arctic winter, are weak in the Aleutian high region, moderate in the polar vortex region (Europe, western Siberia) while, the by far largest differences occur in Antarctic winter.
Show code cell source
fig, ax = plt.subplots(figsize=[9, 4])
for station, da in ds["tropopause"].groupby("primary_station_id"):
    da_resampled = da.resample(time="MS")
    df_mean = da_resampled.mean().to_pandas()
    df_std = da_resampled.std().to_pandas()
    df_mean.plot(yerr=df_std, marker=".", label=station)
    df_mean.to_csv(csv_dir_path / f"monthly_mean_lrt_{station.lower()}.csv")
    df_std.to_csv(csv_dir_path / f"monthly_lrt_std_{station.lower()}.csv")
ax.set_title(f"{da.attrs['long_name']} (LRT)")
ax.set_xlabel("")
ax.set_ylabel(f"Altitude [{da.attrs['units']}]")
ax.grid()
_ = ax.legend()
/data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/matplotlib/transforms.py:2650: RuntimeWarning: divide by zero encountered in scalar divide
  x_scale = 1.0 / inw
/data/common/miniforge3/envs/wp5/lib/python3.11/site-packages/matplotlib/transforms.py:2652: RuntimeWarning: invalid value encountered in scalar multiply
  self._mtx = np.array([[x_scale,     0.0, -inl*x_scale],
 
Figure 1. Monthly mean and standard deviation of the LRT for the Arctic (NYA, in orange), mid-latitude (LIN, in blue) and subtropical (TEN, in green) stations.
Show code cell source
fig, ax = plt.subplots(figsize=[9, 4])
for station, da in ds["tropopause"].groupby("primary_station_id"):
    grouped = da.groupby("time.month")
    df_mean = grouped.mean().to_pandas()
    df_std = grouped.std().to_pandas()
    df_mean.plot(yerr=df_std, marker=".", label=station)
    ax.set_xticks(range(1, 13), calendar.month_abbr[1:13])
    ax.set_xlabel("")
    ax.set_ylabel(f"Altitude [{da.attrs['units']}]")
    ax.set_title("Annual cycle of mean LRT")
    ax.grid()
    ax.legend()
 
Figure 2. Annual cycle of mean LRT and standard deviation for the Arctic (NYA, in orange), mid-latitudes (LIN, in blue) and subtropical (TEN, in green) stations.
Trend and Seasonality of LRT#
In the following, a time series decomposition of the available monthly mean LRT data is performed for trend estimation purposes. Time series of the deseasonalised monthly means of the LRT for LIN and NYA are shown in Figure 3 highlighting its inter-annual variability. The trend analysis (Figure 3) indicates that the LRT rises in both LIN and NYA during the study period. Other authors [6] report that, in the NH, since 2000 tropopause altitude has mainly risen due to the warming of the troposphere. A fine-scale analysis computed at two locations with distinct climate conditions (mid-latitude and Arctic) [12] showed significant positive trends in the tropopause heights with rates of increase of 23.7 ± 6.5 m yr-1 at the mid-latitude site and 28.0 ± 4.0 m yr-1 at the Arctic site during an 18-year study period. Our results confirm the LRT increase in the decade at the two stations considered, even though the increase is at a higher rate (54 ± 7.2 m yr-1 at LIN; 30 ± 7.2 m yr-1 at NYA). Factors, such as higher quality of GRUAN upper-air data, the larger (in time and space) dataset used in [6] and the different time periods that were considered [12], most probably may explain these differences. In addition, due to the length of the time series, uncertainties remain in the estimates.
Show code cell source
fig, ax = plt.subplots(figsize=[9, 5])
for i, (station, da) in enumerate(ds["tropopause"].groupby("primary_station_id")):
    if station not in ["LIN", "NYA"]:
        continue
    # Seasonal decomposition
    df_mean = da.resample(time="MS").mean().interpolate_na("time").to_pandas()
    df_anom = df_mean - df_mean.mean()
    decomposition = statsmodels.tsa.seasonal.seasonal_decompose(
        df_anom, model="additive"
    )
    decomposition.trend.to_csv(csv_dir_path / f"intra-annual_{station}.csv")
    decomposition.seasonal.to_csv(csv_dir_path / f"seasonal_{station}.csv")
    # Linear fit
    x_trend = np.arange(decomposition.trend.size)[~decomposition.trend.isnull()]
    y_trend = decomposition.trend.dropna()
    res = stats.linregress(x_trend, y_trend)
    fit = pd.Series(res.intercept + res.slope * x_trend, index=y_trend.index)
    # Two-sided inverse Students t-distribution
    q = 0.05 / 2
    deg_freedom = len(x_trend) - 2
    ts = scipy.stats.t.ppf(q, deg_freedom)
    # Plot
    decomposition.trend.plot(color=f"C{i}", ls="-", label=station)
    fit.plot(color=f"C{i}", ls="--", label=f"linear fit for {station}")
    # Text
    print(f"{station}:")
    pad = 17
    print(f"{'Equation':>{pad}}: {res.slope:+.4f}x {res.intercept:+.4f}")
    print(f"{'R^2':>{pad}}: {res.rvalue**2:.2f}")
    print(
        f"{'p-value':>{pad}}: " + "< 0.001"
        if res.pvalue < 0.001
        else f"{res.pvalue:.4f}"
    )
    print(f"{'slope (95%)':>{pad}}: {res.slope:+.4f} ± {abs(ts * res.stderr):.4f}")
    print(
        f"{'intercept (95%)':>{pad}}: {res.intercept:+.4f} ± {abs(ts * res.intercept_stderr):.4f}"
    )
# Plot settings
ax.set_title("Linear trends")
ax.set_ylabel(f"Altitude anomaly [{da.attrs['units']}]")
ax.set_xlabel("")
ax.grid()
_ = ax.legend()
LIN:
         Equation: +0.0046x -0.3892
              R^2: 0.60
          p-value: < 0.001
      slope (95%): +0.0046 ± 0.0006
  intercept (95%): -0.3892 ± 0.0551
NYA:
         Equation: +0.0024x -0.1876
              R^2: 0.35
          p-value: < 0.001
      slope (95%): +0.0024 ± 0.0006
  intercept (95%): -0.1876 ± 0.0465
 
Figure 3. Time series of the deseasonalised monthly means of the LRT for LIN (light blue) and NYA (orange); the linear fits (statistically significant at the 95%) are also reported for both stations (in blue, for LIN and in orange for NYA).
ℹ️ If you want to know more#
Key resources#
CDS entries, In situ temperature, relative humidity and wind profiles from 2006 to March 2020 from the GRUAN reference network
external pages:
GRUAN website, GCOS Reference Upper-Air Network
Code libraries used:
- C3S EQC custom functions, - c3s_eqc_automatic_quality_control, prepared by B-Open
- Xarray for working with multidimensional arrays in Python 
- Matplotlib for visualization in Python 
- Scipy for statistics in Python 
References#
[1] Santer, B. D., et al. (2003b), Contributions of anthropogenic and natural forcing to recent tropopause height changes, Science, 301, 479– 483, doi:10.1126/science.1084123
[2] WMO: Meteorology-A three-dimensional science: Second session of the commission for aerology, WMO Bull., 4, 134–138, 1957.
[3] Tinney, E. N., C. R. Homeyer, L. Elizalde, D. F. Hurst, A. M. Thompson, R. M. Stauffer, H. Vömel, and H. B. Selkirk, 2022: A Modern Approach to a Stability-Based Definition of the Tropopause. Mon. Wea. Rev., 150, 3151–3174, https://doi.org/10.1175/MWR-D-22-0174.1
[4] GCOS Reference Upper-Air Network website.
[5] Rieckh, T., Scherllin-Pirscher, B., Ladstädter, F., and Foelsche, U.: Characteristics of tropopause parameters as observed with GPS radio occultation, Atmos. Meas. Tech., 7, 3947–3958, https://doi.org/10.5194/amt-7-3947-2014, 2014.
[6] Meng, L., Liu, J., Tarasick, D. W., Randel, W. J., Steiner, A. K., Wilhelmsen, H., Wang, L., & Haimberger, L. (2021). Continuous rise of the tropopause in the Northern Hemisphere over 1980-2020. Science advances, 7(45), eabi8065. https://doi.org/10.1126/sciadv.abi8065
[7] Dirksen, R. J., Sommer, M., Immler, F. J., Hurst, D. F., Kivi, R., and Vömel, H.: Reference quality upper-air measurements: GRUAN data processing for the Vaisala RS92 radiosonde, Atmos. Meas. Tech., 7, 4463–4490, https://doi.org/10.5194/amt-7-4463-2014, 2014.
[8] Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on <19/04/2024>, Chapter 6 Time series decomposition | Forecasting: Principles and Practice (2nd ed) (otexts.com)
[9] Product User Guide and Specification for GRUAN Temperature, Relative Humidity and Wind profiles.
[10] Feng, S., Y. Fu, andQ. Xiao (2012), Trends in the global tropopause thickness revealed by radiosondes, Geophys. Res. Lett., 39, L20706, doi:10.1029/2012GL053460
[11] Zängl, G., and K. P. Hoinka, 2001: The Tropopause in the Polar Regions. J. Climate, 14, 3117–3139, https://doi.org/10.1175/1520-0442(2001)014<3117:TTITPR>2.0.CO;2
[12] Zhang, J. Tropopause Characteristics Based on Long-Term ARM Radiosonde Data: A Fine-Scale Comparison at the Extratropical SGP Site and Arctic NSA Site. Atmosphere 2022, 13, 965. https://doi.org/10.3390/atmos13060965
