
1.3.1. Consistency between the raster and vector versions of the Randolph Glacier Inventory (RGI) for year 2000 dataset in the context of glaciological and hydrological applications#
Production date: 31-05-2025
Dataset version: 7.0
Produced by: Yoni Verhaegen and Philippe Huybrechts (Vrije Universiteit Brussel)
🌍 Use case: Assessing the spatial distribution of glaciers and the total glacier area in the context of water resource management at the local, regional, and global scale#
❓ Quality assessment question#
“How do glacier areas calculated from vector data compare to those calculated from the raster version of the data, and what does this comparison reveal about the quality and consistency of the glacier extent dataset?”
Glaciers are a major contributor to current global sea-level rise, a resource of fresh water, a potential threat of natural hazards, an important factor for hydro-power production/runoff, and for recreation and tourism. A proper assessment of glacier areas, glacier characteristics, as well as their changes due to warming climatic conditions therefore plays a crucial role in dealing with these issues. In that regard, the “Glaciers distribution data from the Randolph Glacier Inventory (RGI) for year 2000” (here we use version 7.0) dataset on the Climate Data Store (CDS) provides key information with respect to glacier extent and their characteristics. The RGI v7.0 dataset, available on the Climate Data Store, is the most comprehensive global dataset for glacier outlines around the year 2000, offering both vector (polygon) and raster (gridded) formats. The RGI dataset is a collection of digital glacier and ice cap outlines at the global scale, nominally provided for the year 2000 [1, 2]. The data are available in both vector (a shapefile with polygons of individual glacier outlines) and raster (as gridded data with the aggregated fractional glacier areas per pixel) format. As it is the most complete dataset of glacier extent data with a global coverage, researchers often take this data to represent the current state of all glaciers. This notebook investigates the dataset’s consistency by comparing the total regional and global glacier surface area between the vector (polygons) and raster (gridded data) version of the data.
📢 Quality assessment statements#
These are the key outcomes of this assessment
Glacier area data are present in both vector (polygons) and raster (gridded) formats. While both formats are suitable for visualizing and quantifying glacier areas at local, regional, and global scales, the raster data require users to calculate absolute glacier areas from fractional values without the corresponding absolute surface area of a certain grid cell provided. Even though total regional/global glacier areas in both data formats agree relatively well, notable discrepancies between the two formats exist in some regions (particularly in the southern Asian/Himalayan region). Here, significant percentage-wise differences exist between the total glacier area of the raster and vector products. These differences affect the quality, internal consistency, and reliability of the data, for example in the context of the specific use case of this notebook (i.e. for the purpose of water resource management).
The primary goal of RGI is to provide consistent, static glacier outlines for climatological, hydrological, and glaciological applications at regional to global scales. It is not primarily intended for localized studies (e.g. for single, individual glaciers) or for glacier area change detection. Glacier outlines at such larger spatial scales can hence serve as baseline input for key applications such as mass balance or run-off modeling, ice thickness estimates, and climate sensitivity assessments. In that regard, the vector version of the dataset contains the data in polygon format (i.e. shape files for Geographic Information Systems), but the raster data are typically more practical for modelling efforts.
📋 Methodology#
Dataset description#
The dataset of the glaciers distribution on the Climate Data Store (CDS) is an almost complete collection of digital glacier and ice cap outlines and their geometrical/hypsometrical characteristics from various data sources at the global scale [1, 2]. The dataset on the CDS is considered a snapshot of glacier outlines around the year 2000, assembled mainly from satellite images, and is based on the Randolph Glacier Inventory (RGI), which is currently the most complete dataset of glacier outlines. Simply stated, the glaciers in the dataset were automatically classified using the distinctive spectral reflectance signatures of bedrock, snow and ice. During post-processing, raw glacier outlines are quality checked and manually corrected if required (e.g. in the case of a supraglacial debris cover). The vector part of the dataset divides the glaciers into separate RGI regions, i.e. there are 19 of those regions (“clusters” of glaciers) in RGIv7.0. The raster version of the data contains aggregated fractional glacier areas for each pixel of 1 by 1 degree but does not provide sufficient information to classify glaciers into distinct RGI regions. For a more detailed description of the data acquisition and processing methods, we refer to the documentation on the CDS and the ECMWF Confluence Wiki (Copernicus Knowledge Base).
Structure and (sub)sections#
📈 Analysis and results#
1. Data preparation and processing#
1.1 Import packages#
First we load the packages:
1.2 Define request and download#
Then we define requests for download from the CDS and download the data.
🚨 The files can be large! Since the data files to be downloaded and manipulated have a considerable size, this may take a couple of minutes.
🚨 Insert the correct RGI version number to be downloaded below:
Downloading and handling glacier extent data from the CDS, this may take a couple of minutes...
100%|██████████| 1/1 [00:00<00:00, 18.19it/s]
Download glacier extent data (raster) completed.
Now for the vector data...
100%|██████████| 1/1 [00:00<00:00, 52.31it/s]
Downloading and handling glacier extent data (vector) completed.
1.3 Display and inspect data#
Lastly, we can read and inspect the data. Let us print out the raster data to inspect its structure:
<xarray.Dataset> Size: 249kB
Dimensions: (latitude: 170, longitude: 360)
Coordinates:
* longitude (longitude) float64 3kB -179.5 -178.5 -177.5 ... 178.5 179.5
* latitude (latitude) float64 1kB 84.5 83.5 82.5 ... -82.5 -83.5 -84.5
Data variables:
Glacier_area (latitude, longitude) float32 245kB dask.array<chunksize=(128, 128), meta=np.ndarray>
Attributes: (12/15)
Source_Software: Esri ArcGIS
NCO: netCDF Operators version 4.9.1 (Homepage = ht...
title: Randolph Glacier Inventory v7.0 (RGI 7.0) gri...
project: Copernicus Climate Change Service (C3S) Essen...
data_version: version 7.0
institution: Department of Geography - University of Zuric...
... ...
geospatial_lon_resolution: 1.0
geospatial_lat_resolution: 1.0
references: Pfeffer et al. (2014) doi.org/10.3189/2014JoG...
citation: RGI 7.0 Consortium, 2023. Randolph Glacier In...
Conventions: CF Version CF-1.0
comment: Brief data description: Temporal resolution: ...It is a gridded dataset at a 1 by 1 degree spatial resolution containing the fractional area (in %) of a grid cell covered by glaciers in the Glacier_area variable (i.e. these are not absolute glacier areas). Note that not all grid cells exhibit the same absolute surface area, as grid cells get smaller towards the poles. The surface area of a grid cell is not included in the dataset.
And for the vector data:
| rgi_id | o1region | o2region | glims_id | anlys_id | subm_id | src_date | cenlon | cenlat | utm_zone | ... | zmin_m | zmax_m | zmed_m | zmean_m | slope_deg | aspect_deg | aspect_sec | dem_source | lmax_m | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| index | |||||||||||||||||||||
| 0 | RGI2000-v7.0-G-01-00001 | 01 | 01-01 | G204091E67414N | 392889 | 624 | 2008-09-02T00:00:00 | -155.909404 | 67.413726 | 5 | ... | 1485.317600 | 1693.623500 | 1544.66940 | 1553.94030 | 17.606613 | 342.330469 | 1 | COPDEM30 | 725 | POLYGON Z ((-155.90493 67.4163 0, -155.90416 6... |
| 1 | RGI2000-v7.0-G-01-00002 | 01 | 01-01 | G204121E67419N | 392890 | 624 | 2008-09-02T00:00:00 | -155.879114 | 67.419232 | 5 | ... | 1278.400600 | 1394.765700 | 1328.55660 | 1330.91500 | 19.846369 | 345.367012 | 1 | COPDEM30 | 484 | POLYGON Z ((-155.87867 67.41748 0, -155.87872 ... |
| 2 | RGI2000-v7.0-G-01-00003 | 01 | 01-01 | G204471E67431N | 392897 | 624 | 2008-09-02T00:00:00 | -155.530786 | 67.431484 | 5 | ... | 1294.703400 | 1704.742000 | 1423.20680 | 1437.23520 | 23.690063 | 13.467490 | 1 | COPDEM30 | 1099 | POLYGON Z ((-155.53113 67.43564 0, -155.53079 ... |
| 3 | RGI2000-v7.0-G-01-00004 | 01 | 01-01 | G204497E67431N | 392899 | 624 | 2008-09-02T00:00:00 | -155.501948 | 67.430514 | 5 | ... | 1224.178800 | 1286.897700 | 1249.63060 | 1249.06620 | 12.782562 | 42.854332 | 2 | COPDEM30 | 652 | POLYGON Z ((-155.49973 67.43212 0, -155.49931 ... |
| 4 | RGI2000-v7.0-G-01-00005 | 01 | 01-01 | G204521E67429N | 392901 | 624 | 2008-09-02T00:00:00 | -155.478173 | 67.432873 | 5 | ... | 1052.440100 | 1488.129900 | 1290.89230 | 1273.78340 | 17.148510 | 299.058193 | 8 | COPDEM30 | 1903 | POLYGON Z ((-155.49636 67.43892 0, -155.49583 ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 274526 | RGI2000-v7.0-G-19-02738 | 19 | 19-24 | G356783E71154S | 1009836 | 805 | 2000-02-06T00:00:00 | -3.254147 | -71.142262 | 30 | ... | 51.334140 | 190.581200 | 119.21578 | 120.04070 | 1.384531 | 325.807697 | 8 | COPDEM90 | 10105 | POLYGON Z ((-2.9804 -71.23218 0, -2.99065 -71.... |
| 274527 | RGI2000-v7.0-G-19-02739 | 19 | 19-24 | G001187E70166S | 1009760 | 805 | 2000-02-15T00:00:00 | 1.161199 | -70.234860 | 31 | ... | -2.147076 | 266.489380 | 131.61113 | 135.94368 | 1.436012 | 342.096767 | 1 | COPDEM90 | 24839 | POLYGON Z ((1.0094 -70.05421 0, 1.01073 -70.05... |
| 274528 | RGI2000-v7.0-G-19-02740 | 19 | 19-24 | G002050E70629S | 1009761 | 805 | 2000-03-04T00:00:00 | 2.039158 | -70.630907 | 31 | ... | 4.757297 | 107.698586 | 65.82977 | 65.03951 | 1.445538 | 355.737166 | 1 | COPDEM90 | 10029 | POLYGON Z ((2.09959 -70.67463 0, 2.0919 -70.67... |
| 274529 | RGI2000-v7.0-G-19-02741 | 19 | 19-24 | G002939E70509S | 1009762 | 805 | 2000-03-04T00:00:00 | 2.929238 | -70.505540 | 31 | ... | 6.294697 | 261.422030 | 156.44052 | 153.55705 | 1.594081 | 310.365845 | 8 | COPDEM90 | 19935 | POLYGON Z ((3.09012 -70.38237 0, 3.09807 -70.3... |
| 274530 | RGI2000-v7.0-G-19-02742 | 19 | 19-24 | G004329E70346S | 1009763 | 805 | 2000-03-04T00:00:00 | 4.329046 | -70.372802 | 31 | ... | -2.018840 | 352.334080 | 182.37610 | 185.75215 | 1.858386 | 302.110531 | 8 | COPDEM90 | 15185 | POLYGON Z ((4.34737 -70.5039 0, 4.33046 -70.50... |
274531 rows × 29 columns
As can be seen above, the data includes attribute information for each individual glacier (i.e. digitized polygon) in the vector-type dataset. Important for this notebook is the AREA (in v6.0) or area_km2 (in v7.0) column (in km\(^2\)), which contain information about the absolute surface area of the specific glacier. If for some glaciers (part of) these data are missing, the data are filled by -9999999. We will use this information below.
2. Spatial distribution of glaciers and glacier area#
2.1 Pixel-by-pixel fractional glacierized area in the raster data#
Let us check the spatial distribution of all glaciers in the dataset from the raster product. We therefore produce a world map where every grid cell represents the fractional glacier area in that specific grid cell (in %):
2.2 Pixel-by-pixel absolute glacierized area in the raster data#
To calculate the total absolute glacier area from the raster product, we first need to find the surface area of a 1 by 1 degree pixel across the globe, which varies for each pixel (i.e. the surface area of a pixel gets smaller towards the poles). After multiplying those respective surface areas by the fractional glacier area of the corresponding pixel, and summing up all those values, we get an estimate of the total glacier surface area. We assume the Earth to be a sphere so that:
\( \sum\limits_{i=1}^n A_{\text{glacier}} \) [km²] \( = \sum\limits_{i=1}^n \left(\dfrac{1}{100} * \left(\dfrac{A_{\text{RGI},i}}{A_{\text{grid},i}}\right) * A_{\text{grid},i}\right) \)
where \( \dfrac{A_{\text{RGI},i}}{A_{\text{grid},i}} \) [%] is the fractional glacier area as given by the RGI raster product, and \(n\) the total amount of glacier-covered pixels with a single pixel denoted as \(i\),
and \( {A_{\text{grid},i}} \) [km\(^2\)] \( = (R * cos(\phi_i) * \Delta\Lambda_i) * (R \Delta\phi_i) \) with \( R \) the radius of the Earth [km] and \( \Delta\Lambda \) and \( \Delta\phi \) the change of longitude and latitude respectively [rad]. By integrating over all latitudes and longitudes, this formula yields the surface area of a sphere, i.e. \(4 \pi R^2\).
Let us run this script:
Now, we can finally print the total absolute glacier area from the raster format:
A total number of 2096 pixels (or 3.42%) hold glacier extent data in the raster dataset, covering a total surface area of 714458.17 km².
Let us plot the absolute glacier area from the raster dataset on a world map:
In order to compare this number with the vector version of the dataset, we turn to the vector format in the section below.
2.3 Individual glacier area in the vector data#
Let us now inspect the vector product of the same dataset. We therefore plot the polygons (that each represent the outline of an individual glacier) as data points across the globe. Here, data are not aggregated in pixels but values thus represent the glacier area of individual glaciers. This results in the following:
The geographical distribution of glaciers is similar to that of the raster product, although the vector product allows for more detailed and glacier-specific analysis. However, due to the different representation, the above two maps do not allow for an easy quantitative comparison of the glacier area between the two versions of the dataset. Let us therefore get the statistics of the vector version of the glacier extent data:
A total number of 274531 glaciers is present in the dataset, covering a total surface area of 706744.05 km².
This already reveals a difference between both data formats. With the above information in mind, we can now further quantify the total regional (according to the 19 RGI regions) glacier area and compare the values between the vector and raster version of the dataset more in-depth.
3. Regional and global comparison of the vector and raster product#
3.1 Quantification of glacier area differences#
This section presents a regional and global comparison of glacier areas derived from vector and raster data products of the glacier extent data from the Randolph Glacier Inventory version 7.0 (RGIv7.0). In doing so, discrepancies and agreements between the two data formats are analyzed, providing further insights into their consistency and usability for climatological and hydrological applications, such as water resource management. We first quantify the difference of the global total glacier area between both data formats:
\( \Delta A_{\text{glacier}}^{\text{global}} \) [km\(^2\)] \( = \sum\limits^{x,y} A_{\text{glacier}}^{\text{raster}} - \sum\limits^n A_{\text{glacier}}^{\text{vector}} \)
with \(x,y\) all glacier-covered pixels in the raster product and \(n\) the total amount of individual glaciers in the vector product.
This results in:
The difference of the total global glacier area between the raster and the vector product is 7714.12 km².
Let us analyse this regionally and check where the most significant differences are situated. We therefore first mask the vector data for the different RGI regions:
Now we can finally plot the quantified regional differences of the glacier area between the vector and raster version of the data:
\( \Delta A_{\text{glacier}}^{\text{region}} \) [km\(^2\)] \( = \sum\limits^{x,y \in \text{region}} A_{\text{glacier}}^{\text{raster}} - \sum\limits^{n\in \text{region}} A_{\text{glacier}}^{\text{vector}} \)
or in percentage difference:
\( \Delta A_{\text{glacier}}^{\text{region}} \) [%] \( \left|\dfrac {100 * \left(\sum\limits^{x,y \in \text{region}} A_{\text{glacier}}^{\text{raster}} - \sum\limits^{n \in \text{region}} A_{\text{glacier}}^{\text{vector}}\right)}{\sum\limits^{n \in \text{region}} A_{\text{glacier}}^{\text{vector}}}\right| \)
with \(x,y\) all glacier-covered pixels in the raster product in a certain RGI region and \(n\) the total amount of individual glaciers in the vector product in a certain RGI region.
3.2 Glacier area differences between raster and vector data#
The calculations above result in the following:
And to express it as a percentage difference:
Figure 5. Regional comparison of the percentage difference of glacier areas between the vector and raster version of the glacier extent dataset.
The figures above compare the total regional glacier area from RGI regions using raster and vector products, and shows the percentage difference between these data formats. The percentage difference plot indicates that, while the vector product generally aligns with the raster product, significant percentage-wise discrepancies are observed in especially RGI region 15 (southern Asia/Himalaya), where the vector product shows notably higher absolute glacier areas. Most regions, however, exhibit low percentage differences (< 4%), suggesting good agreement. These discrepancies point to potential issues in data resolution or processing, emphasizing the need for careful evaluation of glacier areas in the southern Asian/Himalayan region. Let us also have these numbers quantified by text:
The glacier surface area in RGI region 1 in the raster version is overestimated by 769.25 km² or 0.89% when compared to the vector product.
The glacier surface area in RGI region 2 in the raster version is overestimated by 171.73 km² or 1.18% when compared to the vector product.
The glacier surface area in RGI region 3 in the raster version is overestimated by 4134.28 km² or 3.92% when compared to the vector product.
The glacier surface area in RGI region 4 in the raster version is overestimated by 673.93 km² or 1.66% when compared to the vector product.
The glacier surface area in RGI region 5 in the raster version is overestimated by 2830.00 km² or 3.13% when compared to the vector product.
The glacier surface area in RGI region 6 in the raster version is overestimated by 129.79 km² or 1.17% when compared to the vector product.
The glacier surface area in RGI region 7 in the raster version is overestimated by 1225.13 km² or 3.61% when compared to the vector product.
The glacier surface area in RGI region 8 in the raster version is overestimated by 35.32 km² or 1.20% when compared to the vector product.
The glacier surface area in RGI region 9 in the raster version is overestimated by 1795.29 km² or 3.48% when compared to the vector product.
The glacier surface area in RGI region 10 in the raster version is overestimated by 23.16 km² or 0.88% when compared to the vector product.
The glacier surface area in RGI region 11 in the raster version is overestimated by 13.94 km² or 0.66% when compared to the vector product.
The glacier surface area in RGI region 12 in the raster version is overestimated by 8.97 km² or 0.64% when compared to the vector product.
The glacier surface area in RGI region 13 in the raster version is overestimated by 1619.87 km² or 3.22% when compared to the vector product.
The glacier surface area in RGI region 14 in the raster version is overestimated by 763.59 km² or 2.31% when compared to the vector product.
The glacier surface area in RGI region 15 in the raster version is underestimated by 1767.23 km² or 11.01% when compared to the vector product.
The glacier surface area in RGI region 16 in the raster version is overestimated by 4.18 km² or 0.22% when compared to the vector product.
The glacier surface area in RGI region 17 in the raster version is underestimated by 354.86 km² or 1.28% when compared to the vector product.
The glacier surface area in RGI region 18 in the raster version is underestimated by 9.12 km² or 1.03% when compared to the vector product.
The glacier surface area in RGI region 19 in the raster version is underestimated by 4353.11 km² or 3.26% when compared to the vector product.
The comparison between the raster and vector products of glacier area reveals varying degrees of overestimation and underestimation across different RGI regions when both data formats are compared to one another. Most regions show minor discrepancies, with the raster product generally under or overestimating glacier areas by small percentages. Notably, significant percentage-wise differences in RGI region 15 exist. The discrepancies and inconsistency of the calculated glacier area between the raster and vector versions of the Randolph Glacier Inventory data, especially in the southern Asian/Himalayan region, have been documented before [3] and can have significant implications for water resource management across various spatial scales. Accurate glacier area data are namely crucial for understanding glacier contributions to water supplies, especially in regions where glaciers play a key role in the hydrological cycle (such as the Himalaya).
Potential sources of error and uncertainty for the glacier extent data, that could have contributed to the discrepancy between the raster and vector data, include [3, 4]:
Potential errors in data processing or classification: generating raster data often involves interpolation, reprocessing steps and resampling, which can introduce errors or artefacts that are not present in vector data.
Data version mismatches: the data collection times or product version to convert vector data to raster data might differ, leading to discrepancies due to changes in glacier extents over different data versions.

Figure 6. The 19 first-order regions of the RGI version 7.0 and glacier locations in red. From: GLIMS RGI.
4. Short summary and take-home messages#
For the glacier area product to exhibit a high level of maturity and to be useful for glaciological and hydrological purposes, the raster and vector versions of the product should be consistent in terms of their glacier areas. The assessment in this notebook reveals that, overall, glacierized areas in both data formats agree relatively well with only minor percentage-wise differences. However, more significant percentage-wise discrepancies between the two data formats are present in some areas, which must be acknowledged by the user if the data are fed into glaciological and/or hydrological models. As the primary goal of RGI is to provide consistent, static glacier outlines for climatological, hydrological, and glaciological applications at regional to global scales, both RGI data formats (vector and raster) can be used for such purposes. However, raster data are typically more practical due to its grid format, which allows for easier integration with other global datasets and into modelling efforts. These gridded RGI data are, however, presented as fractional glacier area data within a certain pixel (without a provided indication of the absolute surface area of the grid cell).
Both data formats furthermore share broader limitations, including digitization errors and challenges in identifying glacier outlines due to factors like supraglacial debris, shadows, and cloud cover [4]. These issues are particularly pronounced in regions such as High Mountain Asia and the Caucasus, where the presence of a large-scale supraglacial debris cover typically leads to a misassignment of glacier areas. Users are advised to be cautious in regions that exhibit significant debris cover, small glaciers (i.e. relative errors are known to be higher for smaller glaciers due to their larger outline-to-area ratio [2]), or around the peripheral glaciers of the Greenland and Antarctic ice sheets (where a clear separation of outlet glaciers from the main ice sheet body may be difficult). A last important sidenote is the fact that data cannot be used for temporal glacier area change or climate change impact assessment (it is a single snapshot in time). It is, for example, not possible to use the data for peak water discharge determination, as this would require (spatio-)temporal glacier area changes [5].
ℹ️ If you want to know more#
Key resources#
Glaciers distribution data from the Randolph Glacier Inventory (RGI) for year 2000
RGI website from GLIMS (Global Land Ice Measurements from Space)
C3S EQC custom functions,
c3s_eqc_automatic_quality_controlprepared by B-Open.
References#
[1] RGI Consortium (2017). Randolph Glacier Inventory – A Dataset of Global Glacier Outlines: Version 6.0: Technical Report, Global Land Ice Measurements from Space, Colorado, USA. Digital Media. doi: 10.7265/N5-RGI-60.
[2] Pfeffer, W. T., Arendt, A. A., Bliss, A., Bolch, T., Cogley, J. G., Gardner, A. S., Hagen, J. O., Hock, R., Kaser, G., Kienholz, C., Miles, E. S., Moholdt, G., Mölg, N., Paul, F., Radić, V., Rastner, P., Raup, B. H., Rich, J., Sharp, M. J., and Glasser, N. (2014). The Randolph Glacier Inventory: A globally complete inventory of glaciers, Journal of Glaciology, 60(221), 537-552. doi: 10.3189/2014JoG13J176.
[3] Li, Y. J., Li, F., Shangguan, D. H., Ding, Y. J. (2021). A New Global Gridded Glacier Dataset Based on the Randolph Glacier Inventory Version 6.0. Journal of Glaciology, 67 (2021), 773–76. doi: 10.1017/jog.2021.28.
[4] Paul, F., Barrand, N. E., Baumann, S., Berthier, E., Bolch, T., Casey, K., Frey, H., Joshi, S. P., Konovalov, V., Le Bris, R., Mölg, N., Nosenko, G., Nuth, C., Pope, A., Racoviteanu, A., Rastner, P., Raup, B., Scharrer, K., Steffen, S., and Winsvold, S. (2013). On the Accuracy of Glacier Outlines Derived from Remote-Sensing Data. Annals of Glaciology, 54(63), 171–82. doi: 10.3189/2013AoG63A296.
[5] Huss, M., and Hock, R. (2018). Global-scale hydrological response to future glacier mass loss, Nature Climate Change, 8, 135–140 (2018), doi: 10.1038/s41558-017-0049-x.