logo

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch

1.3.1. Utility of the Randolph Glacier Inventory (RGI) for regional and global glacier volume estimates#

Production date: 31-05-2025

Dataset version: 7.0

Produced by: Yoni Verhaegen and Philippe Huybrechts (Vrije Universiteit Brussel)

🌍 Use case: Using delineated glacier outlines and glacier extent data around the year 2000 for the estimation of the corresponding regional and global glacier ice volumes#

❓ Quality assessment question#

“What is the temporal distribution of the delineated glacier outlines, nominally provided for the year 2000, and how well can they be used to estimate regional and global glacier ice volumes?”

Glaciers are a major contributor to current global sea-level rise, a resource of fresh water, a potential threat of natural hazards, and an important factor for hydro-power production and runoff, as well as for recreation and tourism. A proper assessment of glacier areas, glacier characteristics, as well as their changes due to warming climatic conditions therefore plays a crucial role in dealing with these issues. In that regard, the “Glaciers distribution data from the Randolph Glacier Inventory (RGI) for year 2000” (here we use version 7.0) dataset on the Climate Data Store (CDS) provides key information with respect to glacier extent and their characteristics. The RGI dataset is a collection of digital glacier and ice cap outlines at the global scale, nominally provided for the year 2000 [1, 2]. The data are available in both vector (a shapefile with polygons of individual glacier outlines) and raster (as gridded data with the aggregated fractional glacier areas per pixel) format. Although the goal of the dataset is to provide glacier outlines of all glaciers on Earth as close as possible to the year 2000, one of the main known issues of the dataset is the fact that the acquisition date of the delineated glacier data varies substantially. This notebook investigates the corresponding temporal distribution of delineated glacier data in the vector version of the dataset and evaluates its implications for the estimation of current global glacier ice volumes using the example of Farinotti et al. (2019) [3].

📢 Quality assessment statements#

These are the key outcomes of this assessment

  • The RGIv7.0 dataset, available on the Climate Data Store, is the most comprehensive source of glacier outlines globally for around the year 2000. However, it has certain limitations, including variations in delineation dates (some glaciers were delineated before or after 2000) and the fact that the data represent a single snapshot in time, making it unsuitable for assessing temporal changes or climate impacts. These acquired glacier outlines that deviate from 2000 can lead to systematic errors in glacier volume estimates, with likely glacier volume and area underestimations for glaciers delineated after 2000 and overestimations for those delineated earlier.

  • Despite these limitations, the (area-weighted) mean acquisition date of the glacier outlines is relatively close to 2000 and the systematic error at the global scale is small (<1% of the total global ice volume estimate by Farinotti et al., 2019 [3]), making the dataset reliable to be used as input data for current global glacier volume estimates. However, regional and glacier-specific errors can vary significantly due to factors like strongly deviating delineation dates, varying glacier volume response times, and the presence of a supraglacial debris cover (which can further lead to errors). Users should carefully evaluate these factors for specific applications, including regional or glacier-specific volume estimates and other glaciological or hydrological analyses.

📋 Methodology#

Dataset description#

The dataset of the glaciers distribution on the Climate Data Store (CDS) is an almost complete collection of digital glacier and ice cap outlines and their geometrical/hypsometrical characteristics from various data sources at the global scale [1, 2]. The RGI dataset on the CDS is considered a snapshot of glacier outlines around the year 2000, assembled mainly from satellite images, and is currently the most complete dataset of glacier outlines. Simply stated, the glaciers in the dataset were automatically classified using the distinctive spectral reflectance signatures of bedrock and ice. During post-processing, raw glacier outlines are quality checked and manually corrected if required (e.g. in the case of a supraglacial debris cover). The vector part of the dataset divides the glaciers into separate RGI regions, i.e. there are 19 of those regions (“clusters” of glaciers) in RGIv7.0. The raster version of the data contains aggregated fractional glacier areas for each pixel of 1 by 1 degree but does not provide sufficient information to classify glaciers into distinct RGI regions. For a more detailed description of the data acquisition and processing methods, we refer to the documentation on the CDS and the ECMWF Confluence Wiki (Copernicus Knowledge Base).

Structure and (sub)sections#

1. Data preparation and processing

2. The (area-weighted) mean year of delineation of glacier data

3. Estimation of current global glacier ice volumes with RGI data

4. Short summary and take-home messages

📈 Analysis and results#

1. Data preparation and processing#

1.1 Import packages#

First we load the packages:

Hide code cell source

import fsspec
import geopandas as gpd
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import regionmask
import cartopy.crs as ccrs
import math
import cartopy.feature as cfeature
import matplotlib.ticker as mticker
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import numpy as np
import xesmf as xe
import xarray as xr
from scipy.optimize import curve_fit
import os
from c3s_eqc_automatic_quality_control import download
plt.style.use("seaborn-v0_8-notebook")

1.2 Define request and download#

Then we define requests for download from the CDS:

🚨 The files can be large! Since the data files to be downloaded and manipulated have a considerable size, this may take a couple of minutes.
🚨 Insert the correct RGI version number to be downloaded below:

Hide code cell source

########## SELECT THE RGI VERSION TO DOWNLOAD #############
rgi_version = 7.0  # Change this as needed
###########################################################

# Select correct column names based on RGI version
if rgi_version == 6.0:
    lon_col, lat_col = "CENLON", "CENLAT"
elif rgi_version == 7.0:
    lon_col, lat_col = "cenlon", "cenlat"
else:
    raise ValueError("Unsupported RGI version.")

# Glacier extent data

request_extent = (
    "insitu-glaciers-extent",
    {
        "variable": ["glacier_area"],
        "product_type": ["vector"],
        "version": f"rgi_{int(rgi_version)}_0"
    },
)

# Glacier mass change data

period_start = "1975_1976"
period_stop = "2021_2022"
assert all("_" in period and len(period) == 9 for period in (period_start, period_stop))
# Set request
y0_start, y1_start = map(int, period_start.split("_"))
y0_stop, y1_stop = map(int, period_stop.split("_"))
collection_id = "derived-gridded-glacier-mass-change"
request = {
    "variable": "glacier_mass_change",
    "product_version": "wgms_fog_2023_09",
    "format": "zip",
    "hydrological_year": [
        f"{y0}_{str(y1)[-2:]}"
        for y0, y1 in zip(range(y0_start, y0_stop + 1), range(y1_start, y1_stop + 1))
    ],
}

Next, we download the data:

Hide code cell source

# Get glacier extent data: takes a couple of minutes

print("Downloading and handling glacier extent data from the CDS, this may take a couple of minutes...")
df = download.download_and_transform(*request_extent).to_pandas()
gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df[lon_col], df[lat_col]),
    crs="EPSG:4326",
)

# Download glacier mass change data
print("Downloading glacier mass change data...")

ds = download.download_and_transform(
    collection_id,
    request,
)

print("Downloading done.")
Downloading and handling glacier extent data from the CDS, this may take a couple of minutes...
100%|██████████| 1/1 [00:00<00:00, 32.78it/s]
                                             
Downloading glacier mass change data...
100%|██████████| 1/1 [00:00<00:00, 65.71it/s]
Downloading done.

We define the functions to be used:

Hide code cell source

def plot_map(gdf, var_name=None, label=None, title=None, **kwargs):
    fig = plt.figure(figsize=(20,10))
    kwargs = {"markersize": 1, "legend": var_name is not None} | kwargs
    if var_name:
        kwargs = {"c": var_name, "column": var_name} | kwargs
        if var_name == "year":
            kwargs.setdefault("legend_kwds", {"shrink": 0.49, "extend": "both"})
        if label is not None and var_name == "year":
            kwargs["legend_kwds"].setdefault("label", label)
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines()
    ax.add_feature(cfeature.LAND,color='w')
    ax.add_feature(cfeature.OCEAN)
    ax.add_feature(cfeature.BORDERS,linewidth=0.25,alpha=0.5)
    gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=0.5, color='gray', alpha=0.5, linestyle='--')
    gl.xformatter = LONGITUDE_FORMATTER
    gl.yformatter = LATITUDE_FORMATTER
    gdf.plot(ax=ax, **kwargs)
    ax.axis("off")
    if title:
        ax.set_title(title, fontsize=15)
    return ax

def weighted_average(df, field_name, weights_name):
    df = df[df[field_name].notnull() & df[weights_name].notnull()]
    weights = df[weights_name]
    return (df[field_name] * weights).sum() / weights.sum()

def myExpFunc(x, a, b):
    return a * np.power(x, b)

1.3 Display and inspect data#

Lastly, we can read and inspect the data. Let us print out the data to inspect its structure:

Hide code cell source

df
rgi_id o1region o2region glims_id anlys_id subm_id src_date cenlon cenlat utm_zone ... zmin_m zmax_m zmed_m zmean_m slope_deg aspect_deg aspect_sec dem_source lmax_m geometry
index
0 RGI2000-v7.0-G-01-00001 01 01-01 G204091E67414N 392889 624 2008-09-02T00:00:00 -155.909404 67.413726 5 ... 1485.317600 1693.623500 1544.66940 1553.94030 17.606613 342.330469 1 COPDEM30 725 POLYGON Z ((-155.90493 67.4163 0, -155.90416 6...
1 RGI2000-v7.0-G-01-00002 01 01-01 G204121E67419N 392890 624 2008-09-02T00:00:00 -155.879114 67.419232 5 ... 1278.400600 1394.765700 1328.55660 1330.91500 19.846369 345.367012 1 COPDEM30 484 POLYGON Z ((-155.87867 67.41748 0, -155.87872 ...
2 RGI2000-v7.0-G-01-00003 01 01-01 G204471E67431N 392897 624 2008-09-02T00:00:00 -155.530786 67.431484 5 ... 1294.703400 1704.742000 1423.20680 1437.23520 23.690063 13.467490 1 COPDEM30 1099 POLYGON Z ((-155.53113 67.43564 0, -155.53079 ...
3 RGI2000-v7.0-G-01-00004 01 01-01 G204497E67431N 392899 624 2008-09-02T00:00:00 -155.501948 67.430514 5 ... 1224.178800 1286.897700 1249.63060 1249.06620 12.782562 42.854332 2 COPDEM30 652 POLYGON Z ((-155.49973 67.43212 0, -155.49931 ...
4 RGI2000-v7.0-G-01-00005 01 01-01 G204521E67429N 392901 624 2008-09-02T00:00:00 -155.478173 67.432873 5 ... 1052.440100 1488.129900 1290.89230 1273.78340 17.148510 299.058193 8 COPDEM30 1903 POLYGON Z ((-155.49636 67.43892 0, -155.49583 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
274526 RGI2000-v7.0-G-19-02738 19 19-24 G356783E71154S 1009836 805 2000-02-06T00:00:00 -3.254147 -71.142262 30 ... 51.334140 190.581200 119.21578 120.04070 1.384531 325.807697 8 COPDEM90 10105 POLYGON Z ((-2.9804 -71.23218 0, -2.99065 -71....
274527 RGI2000-v7.0-G-19-02739 19 19-24 G001187E70166S 1009760 805 2000-02-15T00:00:00 1.161199 -70.234860 31 ... -2.147076 266.489380 131.61113 135.94368 1.436012 342.096767 1 COPDEM90 24839 POLYGON Z ((1.0094 -70.05421 0, 1.01073 -70.05...
274528 RGI2000-v7.0-G-19-02740 19 19-24 G002050E70629S 1009761 805 2000-03-04T00:00:00 2.039158 -70.630907 31 ... 4.757297 107.698586 65.82977 65.03951 1.445538 355.737166 1 COPDEM90 10029 POLYGON Z ((2.09959 -70.67463 0, 2.0919 -70.67...
274529 RGI2000-v7.0-G-19-02741 19 19-24 G002939E70509S 1009762 805 2000-03-04T00:00:00 2.929238 -70.505540 31 ... 6.294697 261.422030 156.44052 153.55705 1.594081 310.365845 8 COPDEM90 19935 POLYGON Z ((3.09012 -70.38237 0, 3.09807 -70.3...
274530 RGI2000-v7.0-G-19-02742 19 19-24 G004329E70346S 1009763 805 2000-03-04T00:00:00 4.329046 -70.372802 31 ... -2.018840 352.334080 182.37610 185.75215 1.858386 302.110531 8 COPDEM90 15185 POLYGON Z ((4.34737 -70.5039 0, 4.33046 -70.50...

274531 rows × 29 columns

As can be seen above, the data includes attribute information for each individual glacier (i.e. delineated polygon) in the vector-type dataset. Important for this notebook are the BGNDATE and ENDDATE columns (for version 6.0) or src_date (for version 7.0), which contain information about the time of delineation of the specific glacier. If for some glaciers (part of) the attributes are missing, the data are filled by -9999999. We will use this information below.

2. The (area-weighted) mean year of delineation of glacier data#

2.1 Determination of date of delineation for every glacier#

We can begin to answer the user question by extracting information from the attribute table of the downloaded shapefile. For some glaciers the delineation of the outline is composed from several scenes over multiple years in v6.0, implying that a begin date (attribute “BGNDATE”) and an end date (attribute “ENDDATE”) of the outline delineation is given in the attribute table in YYYYMMDD format (if both are available). For v7.0, the acquisition or delineation date is simply given by “src_date” in YYYY-MM-DD format:

\( year_t = \textstyle\dfrac{BGNDATE_t + ENDDATE_t}{2}\ \) (version 6.0), or

\( year_t = srcdate_t \) (version 7.0)

This results in the following plot:

Hide code cell source

# Convert dates from string to datetime, and add digitalization year
columns = []
if rgi_version == 6.0:
    for column in ("BGNDATE", "ENDDATE"):
        years = gdf[column].str[:4]
        months = gdf[column].str[4:6].replace("99", "01")
        days = gdf[column].str[6:8].replace("99", "01")
        date = years + months + days
        date = date.where(~date.str.startswith("-"))
        columns.append(pd.to_datetime(date))
elif rgi_version == 7.0:
    for column in ["src_date"]:
        years = gdf[column].str[:4]
        months = gdf[column].str[5:7].replace("99", "01")
        days = gdf[column].str[8:10].replace("99", "01")
        date = years + months + days
        date = date.where(~date.str.startswith("-"))
        columns.append(pd.to_datetime(date))
dates = pd.DataFrame(columns).mean()
year = dates.dt.year.astype("Int64")
gdf["decimal_year"] = year + (dates.dt.dayofyear - 1) / (365 + dates.dt.is_leap_year)
gdf["year"] = year

# Get data from years
size = gdf.set_index("year").groupby("year").size()
size = size.reindex(range(gdf["year"].min(), gdf["year"].max() + 1), fill_value=0)
missing = gdf["year"].isnull().sum()
missing_perc = 100 * (missing / len(gdf["year"]))
# Get some statistics
n_glaciers = len(gdf)
if rgi_version == 6.0:
    total_area = gdf["AREA"].sum()
elif rgi_version == 7.0:
    total_area = gdf["area_km2"].sum()
# Plot data
ax = size.plot.bar(
    figsize=(15, 5),
    grid=True,
    width=0.9,
    ylabel="Number of glaciers sampled",
    xlabel=r"Time of delineation $year_t$",
)
_ = ax.text(
    0.015,
    0.87,
    f"Date of delineation data (origin dates) are completely missing for {missing} glaciers or {missing_perc:.2f}% of the dataset.",
    transform=ax.transAxes,
    bbox={"facecolor": "white", "edgecolor": "black"},
);
_ = ax.text(
    0.015,
    0.935,
    f"A total number of {n_glaciers} glaciers is present in the dataset, covering a total area of {total_area:.2f} km².",
    transform=ax.transAxes,
    bbox={"facecolor": "white", "edgecolor": "black"},
);
ax.grid(color='#95a5a6',linestyle='-',alpha=0.25)
ax.set_title("Frequency of the year of delineation in the RGIv{} dataset".format(rgi_version), fontsize=18);
../../_images/b268b3da643a0e155d0a0de2f3cb4c0da879f608c8956dec385c26742dc3d1ec.png

Figure 1. Time of delineation (date of acquisition) for the glaciers in the glacier extent dataset.

2.2 Spatial distribution of date of delineation#

Now that we calculated the acquisition dates, let us check the spatial distribution of the year of delineation for each glacier in the dataset. We therefore produce a world map where every dot represents a glacier, that is colored according to its year of delineation. Glaciers with missing data are not plotted:

Hide code cell source

cmap = mpl.cm.turbo
def round_up_to_10(num):
    return math.ceil(num / 10) * 10
year2 = year.dropna()
year_rounded_up = year2.apply(round_up_to_10)
norm = mpl.colors.BoundaryNorm(range(np.min((year // 10) * 10), np.max(year_rounded_up)+1, 10), cmap.N)
ax = plot_map(
    gdf,
    var_name="year",
    cmap=cmap,
    norm=norm,
    label=r"Time of delineation $year_t$",
    title=f"Global glacier distribution and their dates of acquisition in the RGI dataset v{rgi_version}",
)
../../_images/9947215261b88fc41556b1b651c053728dee8986e2b0617a416046921087ff07.png

Figure 2. Spatial distribution of the time of delineation (date of acquisition) for the glaciers in the glacier extent dataset.

From the above plots, it becomes clear that not all glaciers are delineated in the year 2000. Although the dataset is intended to be a snapshot of the world’s glaciers as they were near the beginning of the 21\(^{st}\) century, the user must keep in mind that the range of delineation dates is substantial. This means that the outlines do not all lie in between the same time frame, which is a main issue of the dataset. The figures above, however, demonstrate that most outlines originate from the period between 2000 and 2010. Nevertheless, for some areas like the western United States or some peripheral glaciers around northern Greenland and the Antarctic Penninsula, the glacier outlines date back to the 1970s or earlier.

2.3 The (area-weighted) mean year of delineation per RGI region#

Let us now calculate the (area-weighted) mean time of delineation for the 19 different RGI regions (see Figure 5 for the locations of these regions). We do that as follows:

  • Arithmetic mean: \( \overline{year}_a = \textstyle\dfrac{1}{{n\in \text{region}}}{\sum\limits_{i=1}^{n\in \text{region}} year_{t,i}} \) with \( n \) the total number of glaciers and \(year_t\) the time of delineation for glacier \(i\) as calculated above.

  • Area-weighted mean: \( \overline{year}_w = \left(\dfrac{1}{\sum\limits_{i=1}^{n\in \text{region}} A_i}\right) \sum\limits_{i=1}^{n\in \text{region}} \left(A_i * year_{t,i}\right) \) where \( A_i \) is the glacier area for glacier \(i\) [km²].

This results in the following plot:

Hide code cell source

if rgi_version == 6.0:
    # Extract region number from RGIID
    gdf["region"] = gdf["RGIID"].str[6:8].astype(int)

    # Group by region and calculate total area
    grouped = gdf[["year", "decimal_year", "AREA", "region"]].groupby("region")
    ungrouped = gdf[["year", "decimal_year", "AREA", "region"]]
    total_area = grouped["AREA"].sum()

    # Calculate means
    means = {
        "Arithmetic": grouped["decimal_year"].mean(),
        "Area-weighted": grouped.apply(
            weighted_average, "decimal_year", "AREA", include_groups=False
        ),
    }
elif rgi_version == 7.0:
    # Extract region number from o1region
    gdf["region"] = gdf["o1region"].astype(int)

    # Group by region and calculate total area
    grouped = gdf[["year", "decimal_year", "area_km2", "region"]].groupby("region")
    ungrouped = gdf[["year", "decimal_year", "area_km2", "region"]]
    total_area = grouped["area_km2"].sum()

    # Calculate means
    means = {
        "Arithmetic": grouped["decimal_year"].mean(),
        "Area-weighted": grouped.apply(
            weighted_average, "decimal_year", "area_km2", include_groups=False
        ),
    }

# Ensure regions are correctly defined as an array
regions = np.array(list(grouped.groups.keys()))  # Extract regions as a sorted array

# Extract the means
arithmetic_means = means["Arithmetic"]
area_weighted_means = means["Area-weighted"]

# Define bar width and positions
bar_width = 0.4
r1 = regions - bar_width / 2  # Positions for arithmetic bars
r2 = regions + bar_width / 2  # Positions for area-weighted bars

textstr = '\n'.join((
    'The global arithmetic mean year of delineation (RGI v{:.1f}) is {:.1f}.'.format(rgi_version, gdf["decimal_year"].mean()),
    'The global area-weighted mean year of delineation (RGI v{:.1f}) is {:.1f}.'.format(rgi_version, means["Area-weighted"].mean()),
))
print(textstr)

# Create the bar plot
plt.figure(figsize=(12, 6))
plt.bar(r1, arithmetic_means, width=bar_width, color='blue', label='Arithmetic')
plt.bar(r2, area_weighted_means, width=bar_width, color='red', label='Area-weighted')

# Add labels, title, and legend
plt.xlabel("RGI region number")
plt.ylabel(r"Year of delineation $\overline{year}_a$ or $\overline{year}_w$")
plt.title("The (area-weighted) mean year of delineation per RGI region (RGI v{})".format(rgi_version), fontsize=18)
plt.xticks(regions)  # Ensure x-ticks match region numbers
plt.ylim(1975, 2015)  # Replace 1980 and 2020 with your desired range
plt.legend()

# Add grid and show plot
plt.grid(color='#95a5a6', linestyle='-', alpha=0.25, axis='y')  # Grid only on y-axis
plt.tight_layout()
plt.show()
The global arithmetic mean year of delineation (RGI v7.0) is 2001.8.
The global area-weighted mean year of delineation (RGI v7.0) is 2001.7.
../../_images/361ab8da4ac76c3e6217e31ce3752aab5522482225b61605b2f4331144b9befa.png

Figure 3. Arithmetic and area-weighted mean year of delineation for the glaciers in each RGI region of the glacier extent dataset.

We observe that at the global scale, the arithmetic area-weighted average of the year of delineation is close to 2002, which only slightly differs in time from the nominally stated dataset reference time (i.e. 2000). However, the (area-weighted) average of the time of delineation varies notably across the different RGI regions: for RGI region 19, for example, the area-weighted average dates back to the beginning of the 1990s, but for RGI region 1 the average origin dates are closer to the 2010s.

3. Estimation of current global glacier ice volumes with RGI data#

3.2 An attempt to quantify potential glacier volume over or underestimation#

It is difficult to quantify the potential volume over or underestimation due to deviating outlines for each glacier in the dataset. However, we can get an idea of the regional glacier volume under/overestimation for 2000 CE by using other datasets that are available on the CDS. Our goal here is to obtain values for the regional glacier volume change between the year 2000 and the area-weighted mean year of delineation of that region \(\overline{year}_w\) (which is thus equivalent to the potential volume over or underestimation relative to the year 2000), denoted with the symbol \(dV_{\text{region}}^{2000,\overline{\text{year}}_w}\).

To get an idea of these potential glacier volume over- or underestimations, we can make use of the “Glacier mass change gridded data from 1976 to present derived from the Fluctuations of Glaciers Database” dataset that is on the CDS. This dataset estimates the gridded yearly mass change of glacier ice (\(dM/dt\)) in Gt yr\(^{-1}\), which can be converted to glacier volume changes (\(dV/dt\)) with units of km\(^3\) yr\(^{-1}\) of ice by making use of the appropriate density. We thus calculate the mass and volume change between 2000 and the area-weighted average year of delineation for a certain RGI region (\(\overline{year}_w\)) to get a total volume change for that specific RGI region between those two dates \(dV_{\text{region}}^{2000,\overline{year}_w}\). If \({{\overline{year}_w}}\) is a decimal (fractional) year, we use linear interpolation to find the interpolated glacier mass and volume change:

\( dV_{\text{region}}^{2000,\overline{year}_w} \text{ [km}^3\text{]} = \begin{cases} \left(\sum\limits_{i=2000}^{\lceil \overline{year}_w \rceil} \sum\limits^{\substack{\text{x,y} \\ \text{{region}}}} \left(\frac{1000}{\rho_i} \frac{dM}{dt}\right)\right) - \\ \left[ \left( \lceil \overline{year}_w \rceil - \overline{year}_w \right) * \left( \sum\limits^{\substack{\text{x,y} \\ \text{region}}} \left(\frac{1000}{\rho_i} \frac{dM}{dt}\right)_{\lceil \overline{year}_w \rceil} \right) \right] & \text{if } \overline{year}_w > 2000 \\ \left(\sum\limits_{i=\lfloor \overline{year}_w \rfloor}^{2000} \sum\limits^{\substack{\text{x,y} \\ \text{region}}} \left(\frac{1000}{\rho_i} \frac{dM}{dt}\right)\right) - \\ \left[ \left( \overline{year}_w - \lfloor \overline{year}_w \rfloor \right) * \left( \sum\limits^{\substack{\text{x,y} \\ \text{region}}} \left(\frac{1000}{\rho_i} \frac{dM}{dt}\right)_{\lfloor \overline{year}_w \rfloor} \right) \right] & \text{if } \overline{year}_w < 2000 \end{cases} \)

Here, we further assume a density \(\rho_i\) of 850 kg m\(^{-3}\) for the mass to volume conservation of glaciers [9]. This may look like a complicated formula, but the principle is simple: for each RGI region, we simply sum the mass (and hence volume) changes over all spatial pixels between the year 2000 and the area-weighted year of delineation \((\overline{year}_w)\):

  • If \((\overline{year}_w)\) is an integer, the second part of the formula (after the minus sign) drops out.

  • If \((\overline{year}_w)\) is not an integer, the formula adjusts for the fractional part of the year:

    • If \((\overline{year}_w)\) > 2000, it uses the fraction between \((\overline{year}_w)\) and \((\lceil \overline{year}_w \rceil)\), where \(\lceil \cdot \rceil\) denotes the ceiling function, which rounds a number up to the nearest integer.

    • If \((\overline{year}_w)\) < 2000, it uses the fraction between \((\lfloor \overline{year}_w \rfloor)\) and \((\overline{year}_w)\), where \(\lfloor \cdot \rfloor\) denotes the floor function, which rounds a number down to the nearest integer.

Now that we have all information to derive values for \( dV_{\text{region}}^{2000,\overline{year}_w}\), let us apply this to the 19 different RGI regions in the RGIv7.0 data:

Hide code cell source

if rgi_version == 6.0:
    # Mask data for RGI regions
    regions = gdf["RGIID"].str[6:8].astype(int)
    da = regions.to_xarray().assign_coords(
        lon=gdf["CENLON"].to_xarray(),
        lat=gdf["CENLAT"].to_xarray(),
    )
elif rgi_version == 7.0:
    # Mask data for RGI regions
    regions = gdf["o1region"].astype(int)
    da = regions.to_xarray().assign_coords(
        lon=gdf["cenlon"].to_xarray(),
        lat=gdf["cenlat"].to_xarray(),
    )
regridder = xe.Regridder(da, ds, locstream_in=True, method="nearest_s2d")
mask_2d = regridder(da)
mask = xr.concat(
    [(mask_2d == region).expand_dims(region=[region]) for region in regions.unique()],
    "region",
)
ds = ds.where(mask)

# Compute cumulative fields
ds["time"].attrs |= {"long_name": "Time", "units": "yr"}
for da in ds.data_vars.values():
    da.attrs["units"] += " $yr^{-1}$"
    da.attrs["long_name"] = da.attrs["long_name"].replace("_", " ").title()

# Mass change
cumulative = (ds["glacier_mass_change_gt"].sum(("latitude","longitude"),keep_attrs=True)).cumsum("time")
cumulative.attrs = {
    "units": ds["glacier_mass_change_gt"].attrs["units"].split()[0],
    "long_name": f"Cumulative {ds['glacier_mass_change_gt'].attrs['long_name']}",
}
ds["Cumulative"] = cumulative
    
# Calculate volume under or overestimation
year = 2000
original = ds.reset_coords()["Cumulative"]
time = original["time"]
original["time"] = time.dt.year + (time.dt.dayofyear - 1) / (365 + time.dt.is_leap_year)
interpolated = original.interp(time=means["Area-weighted"].to_xarray())
estimate = (1000/850) * (interpolated - original.sel(time=year)).compute() # using appropriate density from Huss (2013)
# Do the loop
for region, result in zip(estimate.region.values, estimate.values):
    print(
        f"The volume estimate in RGI region {region:>2}"
        f" for the year {year} is"
        f" {'under' if result < 0 else 'over':>5}estimated"
        f" by {abs(float(result)):.2f} km³."
    )
total = estimate.sum()
print(
    f"\nThe volume estimate at the global scale"
    f" for the year {year} is"
    f" {'under' if result <0 else 'over'}estimated"
    f" by {abs(float(total)):.2f} km³."
)
The volume estimate in RGI region  1 for the year 2000 is underestimated by 624.97 km³.
The volume estimate in RGI region  2 for the year 2000 is underestimated by 39.12 km³.
The volume estimate in RGI region  3 for the year 2000 is  overestimated by 5.98 km³.
The volume estimate in RGI region  4 for the year 2000 is underestimated by 23.77 km³.
The volume estimate in RGI region  5 for the year 2000 is underestimated by 21.75 km³.
The volume estimate in RGI region  6 for the year 2000 is underestimated by 6.82 km³.
The volume estimate in RGI region  7 for the year 2000 is underestimated by 101.60 km³.
The volume estimate in RGI region  8 for the year 2000 is underestimated by 13.40 km³.
The volume estimate in RGI region  9 for the year 2000 is underestimated by 86.71 km³.
The volume estimate in RGI region 10 for the year 2000 is underestimated by 4.07 km³.
The volume estimate in RGI region 11 for the year 2000 is underestimated by 5.97 km³.
The volume estimate in RGI region 12 for the year 2000 is underestimated by 0.45 km³.
The volume estimate in RGI region 13 for the year 2000 is underestimated by 10.02 km³.
The volume estimate in RGI region 14 for the year 2000 is underestimated by 3.92 km³.
The volume estimate in RGI region 15 for the year 2000 is underestimated by 10.98 km³.
The volume estimate in RGI region 16 for the year 2000 is underestimated by 0.13 km³.
The volume estimate in RGI region 17 for the year 2000 is underestimated by 41.87 km³.
The volume estimate in RGI region 18 for the year 2000 is  overestimated by 0.26 km³.
The volume estimate in RGI region 19 for the year 2000 is underestimated by 185.05 km³.

The volume estimate at the global scale for the year 2000 is underestimated by 1174.35 km³.

Or when plotted:

Hide code cell source

# Define regions
regions = np.arange(1, max(gdf["region"]) + 1)  # Create an array of region numbers
# Create the bar plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(regions, estimate.values, color='black', label='Volume over/underestimation')
# Set plot labels and title
ax.set_xlabel("RGI region number")
ax.set_ylabel("Volume over/underestimation $dV_{{region}}^{2000,\overline{year}_w}$ (km$^3$)")
ax.set_title("Glacier volume over or underestimation per RGI region w.r.t. 2000 (RGI v{})".format(rgi_version), fontsize=15)
# Set x-ticks, grid, and x-limits
ax.set_xticks(regions)  # Ensure x-ticks match the region numbers
ax.grid(color='#95a5a6', linestyle='-', alpha=0.25, axis='y')  # Add gridlines only on the y-axis
ax.set_xlim(1-0.5, max(regions)+0.5)  # Adjust the x-axis limits to match region numbers
# Add legend (optional)
ax.legend()
# Show the plot
plt.tight_layout()
plt.show()
../../_images/511d4a65a5a716b76fce5015da309987977a9f10355e3a6017a2974bb7a32d6b.png

Figure 4. Total volume over or underestimation compared to the year 2000 for the glaciers in each RGI region of the glacier extent dataset.

In the plot above, negative values indicate an underestimation. Underestimations of glacier volumes are dominant, which is because the (area-weighted) average origin dates of the outlines are generally later than 2000, together with a general trend of glacier shrinkage during the last several decades [6]. It is clear that especially RGI regions 1 (Alaska) and 19 (Antarctic and Subantarctic) exhibit a relatively large underestimation of the total glacier ice volume, mainly due to their large deviation of the area-weighted average year of delineation to the year 2000, as well as the large glacier ice volume changes during this corresponding period. At the global scale, glacier ice volume estimates would exhibit an underestimation of ca. 1174 km³ due to misdated glacier outlines, which is only ca. 0.75% of the total glacier ice volume estimate by Farinotti et al. (2019) [3]. At the global scale, the above obtained deviation related to the misdated glacier outlines from RGIv7.0 is thus relatively low when compared to the total global glacier ice volume estimate from Farinotti et al. (2019), implying that the dataset is well-suited to estimate current global glacier ice volumes for the year 2000. The corresponding systematic error is thus small, but glacier- and region-dependent.

alternatvie text

Figure 5. The 19 first-order regions of the RGI version 7.0 and glacier locations in red. From: GLIMS RGI.

3.3 Other potential sources of error and uncertainty#

Several additional factors contribute to the uncertainty in glacier data derived from the RGI dataset. These include low-resolution and poor-quality source data, errors by delineators, and challenges such as shadows, clouds, proglacial lakes, seasonal snowfields, and limited multi-temporal satellite imagery. The most significant source of misdelineation is supraglacial debris, which often results in glacier area underestimation in regions with extensive debris cover (e.g. High Mountain Asia and the Caucasus), due to the spectral similarity between debris and surrounding moraines or bedrock [10]. Additionally, ice bodies smaller than 0.01 km², considered below the threshold for ice flow, were excluded, leading to further underestimation of glacier area in some regions [1].

The interpretation of what constitutes a glacier also varies due to the diverse global community that contributed to the dataset, causing inconsistencies in how features like debris-covered glaciers, tributaries, and peripheral ice bodies in Antarctica and Greenland were handled [11]. For example, distinguishing debris-covered ice from bedrock or separating outlet glaciers from main ice sheets remains problematic. However, in the more recent v7.0, only glaciers that are not connected or only weakly connected to the GrIS (as defined by connectivity levels lower than 2 in [13]) are included in the dataset. For the AIS, the seperation remains, in some cases, open to discussion. In the southern Asian/Himalayan region, significant differences between glacier areas in the gridded and vector versions of the dataset are furthermore present [12].

Quantitative error estimates are not provided in the dataset, but [2] noted that relative errors are higher for small glaciers due to their larger outline-to-area ratio. Consequently, the uncertainty in an individual glacier outline depends on the quality of the source material and the specific glacier’s characteristics, even for outlines assumed to represent 2000. Users should exercise caution, particularly in regions with significant debris cover or small glacier features.

4. Short summary and take-home messages#

The glacier area dataset on the CDS aims to capture the spatial distribution and state of glacier outlines representative for the year 2000. However, the date of delineation (i.e. the time for which the outlines are representative) can significantly deviate from 2000, resulting in a systematic error. Nevertheless, the (area-weighted) mean acquisition date of the glacier outlines is relatively close to 2000 at the global scale, although a tendency for delineation dates that are slightly more recent than 2000 is noted.

When quantifying current global glacier volumes with RGI glacier areas as input data, this sytematic error has to be taken into account. The suitability of a RGI outline for current glacier ice volume estimates hence depends on (1) the deviation of the time since delineation from 2000 and (2) the significance of glacier area/volume changes during that period, which are influenced by changes of the local climate and the specific glacier geometries. When compared to the total global ice volume estimate by Farinotti et al. (2019) [3], the resulting systematic error calculated in this notebook is in the order of <1% only. This makes the dataset reliable to be used as input data for current global glacier volume estimates, but the systematic error is glacier- and region-dependent.

Additional factors to take into account are, amongst others, supraglacial debris, which often leads to underestimation of glacier area in debris-covered regions such as High Mountain Asia. Furthermore, although no quantitative error estimates are provided with the data, small glaciers tend to have higher relative errors [2]. When fed into glaciological and/or hydrological models, users are thus advized to take into account these considerations.

ℹ️ If you want to know more#

Key resources#

References#

  • [1] RGI Consortium (2017). Randolph Glacier Inventory – A Dataset of Global Glacier Outlines: Version 6.0: Technical Report, Global Land Ice Measurements from Space, Colorado, USA. Digital Media. doi: 10.7265/N5-RGI-60.

  • [2] Pfeffer, W. T., Arendt, A. A., Bliss, A., Bolch, T., Cogley, J. G., Gardner, A. S., Hagen, J. O., Hock, R., Kaser, G., Kienholz, C., Miles, E. S., Moholdt, G., Mölg, N., Paul, F., Radić, V., Rastner, P., Raup, B. H., Rich, J., Sharp, M. J., and Glasser, N. (2014). The Randolph Glacier Inventory: A globally complete inventory of glaciers, Journal of Glaciology, 60(221), 537-552. doi: 10.3189/2014JoG13J176.

  • [3] Farinotti, D., Huss, M., Fürst, J. J., Landmann, J., Machguth, H., Maussion, F., and Pandit, A. (2019). A consensus estimate for the ice thickness distribution of all glaciers on Earth. Nature Geoscience, 12(3), 168-173. doi: 10.1038/s41561-019-0300-3.

  • [4] Huss, M., and Farinotti, D. (2012). Distributed ice thickness and volume of all glaciers around the globe, Journal of Geophysical Research, 117, F04010. doi: 10.1029/2012JF002523.

  • [5] Marzeion, B., Jarosch, A. H., and Hofer, M. (2012). Past and future sea-level change from the surface mass balance of glaciers, The Cryosphere, 6, 1295–1322. doi: 10.5194/tc-6-1295-2012.

  • [6] Zemp, M., Frey, H., Gärtnew-Roer, I., Nussbaumer, S. U., Helzle, M., Paul, F., Haeberli, W., Denzinger, F., Ahlstrøm, A. P., Anderson, B., Bajracharya, S., Baroni, C., Braun, L. N., Cáceres, B. E., Casassa, G., Cobos, G., Dávila, L. R., Delgado Granados, H., Demuth, M. N., Espizua, L., Fischer, A., Fujita, K., Gadek, B., Ghazanfar, A., Hagen, J. O., Holmlund, P., Karimi, N., Li, Z., Pelto, M., Pitte, P., Popovnin, V. V., Portocarrero, C. A., Prinz, R., Sangewar, C. V., Severskiy, I., Sigurdsson, O., Soruco, A., Usubaliev, R., and Vincent, C. (2015). Historically unprecedented global glacier decline in the early 21st century, Journal of Glaciology, 61, 745-762. doi: 10.3189/2015JoG15J017.

  • [7] Li, Y. J., Ding, Y. J., Shangguan, D. H., and Wang, R. J. (2019). Regional differences in global glacier retreat from 1980 to 2015, Advances in Climate Change Research, 10(4), 203–213. doi: 10.1016/j.accre.2020.03.003.

  • [8] Leclercq, P. W., Oerlemans, J., Basagic, H. J., Bushueva, I., Cook, A. J., and Le Bris, R. (2014). A data set of worldwide glacier length fluctuations, Cryosphere, 2014(8), 659–672. doi: 10.5194/tc-8-659-2014.

  • [9] Huss, M. (2013). Density assumptions for converting geodetic glacier volume change to mass change, The Cryosphere, 7, 877–887, doi: 10.5194/tc-7-877-2013.

  • [10] Paul, F., Barrand, N. E., Baumann, S., Berthier, E., Bolch, T., Casey, K., Frey, H., Joshi, S. P., Konovalov, V., Le Bris, R., Mölg, N., Nosenko, G., Nuth, C., Pope, A., Racoviteanu, A., Rastner, P., Raup, B., Scharrer, K., Steffen, S., and Winsvold, S. (2013). On the Accuracy of Glacier Outlines Derived from Remote-Sensing Data. Annals of Glaciology, 54(63), 171–82. doi: 10.3189/2013AoG63A296.

  • [11] Hock, R., Maussion, F., Marzeion, B., and Nowicki, S. (2022). What is the global glacier ice volume outside the ice sheets? Journal of Glaciology, 69(273), 204–10. doi: 10.1017/jog.2023.1.

  • [12] Li, Y. J., Li, F., Shangguan, D. H., Ding, Y. J. (2021). A New Global Gridded Glacier Dataset Based on the Randolph Glacier Inventory Version 6.0. Journal of Glaciology, 67 (2021), 773–76. doi: 10.1017/jog.2021.28.

  • [13] Rastner, P., Bolch, T., Mölg, N., Machguth, H., Le Bris, R., and Paul, F. (2012). The first complete inventory of the local glaciers and ice caps on Greenland, The Cryosphere, 6, 1483–1495, https://doi.org/10.5194/tc-6-1483-2012