logo

Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch

1.5.1. Lake Victoria’s 2020 Flood Event Analysis#

Production date: 28-05-2025

Produced by: Amaya Camila Trigoso Barrientos (VUB)

🌍 Use case: Definition of the 2020 Lake Victoria flood event through statistical analysis#

❓ Quality assessment question#

  • Can extreme value analysis be applied to detect, define and quantify Lake Victoria’s 2020 Flood Event using the LWL v5.0 lake water level dataset?

In 2020, Lake Victoria experienced an extreme flood event with significant regional impacts. Accurate definition and characterization of such events are essential for understanding their causes and informing adaptation strategies. This notebook explores whether the satellite-lake-water-level (C3S-LWL v5.0) dataset, a freely available and harmonized satellite-derived product, is suitable for defining high-impact flood events through statistical analysis.

Inspired by methodologies such as those applied by Pietrosanti et al. using the DAHITI dataset, this analysis applies extreme value analysis (EVA) to assess the 2020 flood event. The goal is to evaluate whether C3S-LWL v5.0 can serve as a reliable resource for event definition in impact attribution workflows, particularly in data-scarce regions like the Lake Victoria basin.

📢 Quality assessment statement#

These are the key outcomes of this assessment

  • The 2020 event ranked as the third-highest 180-day lake level rise in the historical record (1948–2023), with only 1962 and 1998 recording greater increases.

  • The magnitude of the 2020 rise was 1.21 meters, identical in both the C3S-LWL and DAHITI datasets, strengthening confidence in the validity of satellite-based lake level measurements.

  • From 2016 onward, improved satellite coverage has led to strong agreement between C3S-LWL and DAHITI, demonstrating the potential of the C3S-LWL v5.0 dataset for reliable use in impact atribution, long-term hydrological analyses and flood risk projections.

📋 Methodology#

The analysis and results are organised in the following steps, which are detailed in the sections below:

1. Data request and download

  • Download C3S–LWL v5.0 satellite-lake-water-level data for Lake Victoria.

2. C3S comparison DAHITI

  • Load and preprocess DAHITI data.

  • Plot monthly count of data recorded for both the DAHITI and C3S data.

  • Plot C3S and DAHITI’s Water Level and show the maximum difference.

3. Time series reconstruction

  • Load and preprocess HYDROMET data.

  • Calculate the average difference between HYDROMET and C3S-LWL overlapping data and correct the HYDROMET dataset based on this.

  • Interpolate to fill missing gaps in HYDROMET.

  • Plot reconstructed time series.

4. Probabilistic extreme event analysis

  • Apply the annual block maxima to the reconstructed time series.

  • Compare with the results from Pietroiusti et al. (2024) [1].

📈 Analysis and results#

1. Data request and download#

Import packages#

Import the packages to download the data using the c3s_eqc_automatic_quality_control library.

Hide code cell source
import warnings
import numpy as np
import matplotlib.pyplot as plt
from c3s_eqc_automatic_quality_control import download
import os
import pandas as pd
import matplotlib.dates as mdates
import pprint
import os , glob
import scipy as scipy
from scipy import stats
from scipy.stats import genextreme
import sklearn 
import datetime
from datetime import datetime
from matplotlib.pyplot import cm

os.environ["CDSAPI_RC"] = os.path.expanduser("~/trigoso_camila/.cdsapirc")
warnings.filterwarnings("ignore")
plt.style.use("seaborn-v0_8-notebook")

Set the data request#

Set the request for the specific lake (Victoria in our case) analyzed and the collection id (satellite lake water level).

Hide code cell source
collection_id = "satellite-lake-water-level"
request = {
    "variable": "all",
    "region": "southern_africa",
    "lake": "victoria",
}
varname = "water_surface_height_above_reference_datum"

Download data#

Hide code cell source
da = download.download_and_transform(collection_id, request)[varname].compute()
Hide code cell output
100%|██████████| 1/1 [00:00<00:00, 12.34it/s]

2. C3S comparison DAHITI#

Load and preprocess DAHITI data#

This assesment will apply the methods developed by Pietroiusti et al. (2024) [1]. On that paper, the DAHITI dataset [2] was employed. Therefore, a comparison of water level of lake Victoria data in C3S-LWL v5.0 and in DAHITI will be carried out.

Hide code cell source
#Load DAHITI lake water levels
df_dahiti = pd.read_csv("/data/wp5/trigoso_camila/q01/Water Level/DAHITI_lake_victoria.csv", sep=';')
# ----- Prepare Data -----
# Convert df_dahiti datetime column and set it as index
df_dahiti['datetime'] = pd.to_datetime(df_dahiti['datetime'], dayfirst=True)
# Convert df_dahiti water level from millimeters to meters (if needed)
df_dahiti['water_level_m'] = df_dahiti['water_level'] / 1000.0
df_dahiti_o=df_dahiti
df_dahiti.set_index('datetime', inplace=True)

Monthly temporal completeness#

Hide code cell source
# First dataset: xarray object
monthly_counts_1 = da.resample(time="M").count().to_pandas()

# Second dataset: Pandas DataFrame
monthly_counts_2 = df_dahiti['water_level'].resample("M").count()

# Create subplots with 1 row and 2 columns
fig, axes = plt.subplots(1, 2, figsize=(18, 6), dpi=300, sharey=True)

# Plot first bar chart
monthly_counts_1.plot(kind='bar', color='skyblue', width=0.8, ax=axes[0])
axes[0].set_title("Monthly Count of Available C3S-LWL v5.0 Water Level Values")
axes[0].set_xlabel("Time")
axes[0].set_ylabel("Count of Available Values")
axes[0].set_xticks(range(0, len(monthly_counts_1), 12))
axes[0].set_xticklabels(monthly_counts_1.index[::12].strftime('%Y'), rotation=45)
axes[0].grid(axis='y', linestyle='--', alpha=0.7)

# Plot second bar chart
monthly_counts_2.plot(kind='bar', color='skyblue', width=0.8, ax=axes[1])
axes[1].set_title("Monthly Count of Available DAHITI Water Level Values")
axes[1].set_xlabel("Time")
axes[1].set_xticks(range(0, len(monthly_counts_2), 12))
axes[1].set_xticklabels(monthly_counts_2.index[::12].strftime('%Y'), rotation=45)
axes[1].grid(axis='y', linestyle='--', alpha=0.7)

# Adjust layout
plt.tight_layout()
plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/completeness.png", dpi=300)
plt.show()
Hide code cell output
../../_images/00c6487b6737d97042fba0e12975daa31a7cc69a6ece645483a2d5f5f39878c2.png
../../_images/017e0bfa-a444-4633-bf0a-7c2e4daf6232.png

Fig. 1.5.1.1 Comparison of count of monthly values over time available in the C3S-LWL v5.0 dataset (on the left) and in DAHITI (on the right).#

The DAHITI dataset exhibits a relatively consistent number of monthly observations, typically ranging from 1 to 4 throughout its entire temporal span. In contrast, the C3S-LWL v5.0 dataset shows variability in the number of monthly values, depending on the availability of satellite missions over time. Up until the end of 2010, only one value per month was recorded. From 2011 to 2015, this number increased to mostly three observations per month, likely reflecting additional unfiltered data from the Jason-2 mission. A significant increase in observation frequency is evident from 2016 onward, corresponding to the introduction of Jason-3 and Sentinel-3A satellites. The C3S-LWL v5.0 dataset contains only one gap in 2006, indicating strong overall temporal completeness.

This can be explained by the fact that DAHITI applies interpolation using a Kalman filter, a statistical technique that predicts the water level at the next time step based on previous observations and associated uncertainties [1]. This allows DAHITI to maintain a nearly uniform temporal resolution, achieving approximately three observations per month even during periods with limited satellite overpasses. On the other hand, the C3S-LWL v5.0 dataset does not interpolate. It provides water level observations derived from satellite altimetry at the actual times satellites pass over a lake. According to the Algorithm Theoretical Basis Document (ATBD), C3S applies strict data filtering based on two key criteria: (i) removal of measurements with invalid (NaN) geophysical corrections, and (ii) exclusion of data with backscatter coefficient (sigma0) values indicating land contamination. Consequently, more data from earlier missions is filtered out due to the lower quality of older satellite sensors and ground processing systems, which improved significantly in later years.

Plot C3S and DAHITI’s Water Level#

Hide code cell source
# Extract time series from xarray (da)
da_df = da.to_dataframe(name='water_level')
# Plotting
plt.figure(figsize=(12, 6), dpi=300)
df_dahiti_o['water_level_m'].plot(label='DAHITI', alpha=0.7)
da_df['water_level'].plot(label='C3S-LWL v5.0', alpha=0.7)

# Final touches
plt.title("Lake Victoria Water Level Comparison Between Products")
plt.grid(True)
plt.xlabel("Date")
plt.ylabel("Water Level [m]")
plt.legend()
plt.tight_layout()
plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/comparisonDH.png", dpi=300)
plt.show()
Hide code cell output
../../_images/ef104f9c9b628087071066fe6a58c273fe08cf1f0b8bbfe8e5ba9873d43eaf87.png
../../_images/f53887f6-cfee-4af8-90b7-2fc3eb54b191.png

Fig. 1.5.1.2 Lake Victoria’s water level from 1992 to present from C3S-LWL v5.0 and DAHITI#

Hide code cell source
# Extract time series from xarray (da)
df_c3s = da.to_dataframe(name='water_level_c3s')
df_c3s = df_c3s.dropna()
df_c3s.index.name = 'datetime'

# Align by nearest timestamp
comparison_df = pd.merge_asof(
    df_c3s,
    df_dahiti.reset_index()[['datetime', 'water_level_m']].rename(columns={'water_level_m': 'water_level_dahiti'}),
    on='datetime',
    direction='nearest',
    tolerance=pd.Timedelta("3D")
)

# Drop rows where no match was found
comparison_df = comparison_df.dropna()

# Compute difference
comparison_df['diff'] = comparison_df['water_level_c3s'] - comparison_df['water_level_dahiti']

# Ensure the datetime column is in datetime format
comparison_df['datetime'] = pd.to_datetime(comparison_df['datetime'])

# Define periods
periods = {
    '1992-2002 (TOPEX/Poseidon)': ('1992-01', '2001-12'),
    '2002-2016': ('2002-01', '2015-12'),
    '2016-2023': ('2016-01', '2023-12')
}

# Calculate average differences using the datetime column
for label, (start, end) in periods.items():
    mask = (comparison_df['datetime'] >= start) & (comparison_df['datetime'] <= end)
    avg_diff = comparison_df.loc[mask, 'diff'].mean()
    print(f"Average difference for {label}: {avg_diff:.3f} m")
Average difference for 1992-2002 (TOPEX/Poseidon): 0.332 m
Average difference for 2002-2016: 0.042 m
Average difference for 2016-2023: 0.083 m

The difference in water level estimates for Lake Victoria between the DAHITI and C3S-LWL v5.0 datasets varies noticeably over time (see Fig. 1.5.1.2). A larger discrepancy is observed during the first decade of the time series (1992-2002), when the only satellite data available originated from the TOPEX/Poseidon mission. During this period, the average difference between the two datasets is 33.2 cm in average, with C3S values consistently higher than those from DAHITI.

This early discrepancy appears to stem from differences in the algorithms and retracking methods used by the two datasets. DAHITI, in some cases, applies an improved 10% threshold retracker for better perfomance on inland water bodies, depending of the especific case. However, specific information on whether the 10% retracker was applied to Lake Victoria could not be confirmed, since available documentation only details its application forlakes in the Americas. Additionally, DAHITI employs a Kalman filter to interpolate and smooth its time series, further influencing the results [2]. In contrast, according to the ATBD, the C3S-LWL v5.0 dataset uses a ocean model retracking for large lakes such as Lake Victoria

However, this difference diminishes noticeably after 2002 (less than 10 cm in average), as newer satellite missions introduced more frequent and accurate measurements. The influence of processing seems to be much smaller in later years, likely because the improvements in sensor technology and data availability reduce the impact of such methodological differences.

These findings are supported by the PQAR, which includes Lake Victoria under the Southern Africa region. According to the PQAR, the Pearson correlation coefficient between DAHITI and C3S in this region is very close to 1, and the Unbiased Root Mean Square Error (URMSE) remains below 25%, indicating strong overall agreement in this region.

3. Time series reconstruction#

The code on this section is based on the work of Pietroiusti et al. (2024) [1].

Load and preprocess HYDROMET data#

From January 1, 1948 to August 1, 1996, daily in situ water level measurements at the Jinja station were obtained from the WMO Hydrometeorological Survey (hereafter referred to as HYDROMET) [3]. These measurements, originally recorded as water depth at the lake’s outflow, were later converted to meters above sea level by adding a geoid correction of 1122.887 m, following the approach of Vanderkelen et al. (2018) [4].

Hide code cell source
#Load data
HYDROMET_raw = pd.read_csv('/data/wp5/trigoso_camila/q01/Water Level/Jinja_lakelevels_Van.txt', sep = "\t",  header = None)
#Pre-process data
HYDROMET_raw.columns = ['year', 'month', 'day', 'water level', 'meas']
HYDROMET_dates = pd.to_datetime(HYDROMET_raw[['year', 'month', 'day']])
df = pd.DataFrame(HYDROMET_dates, columns = ['date'])
df[['water_level', 'meas']] = HYDROMET_raw[['water level', 'meas']]
HYDROMET = df.set_index(['date'])
# Replace HYDROMET zeros with NaN, so that Matplotlib won't connect the points
HYDROMET['water_level'] = HYDROMET['water_level'].replace(0, np.nan)
HYDROMET['meas'] = HYDROMET['meas'].replace(0, np.nan)
HYDROMET
water_level meas
date
1948-01-01 1134.097 11.210
1948-01-02 1134.102 11.215
1948-01-03 1134.062 11.175
1948-01-04 1134.052 11.165
1948-01-05 1134.077 11.190
... ... ...
1996-07-28 1134.777 11.890
1996-07-29 1134.757 11.870
1996-07-30 1134.752 11.865
1996-07-31 1134.717 11.830
1996-08-01 1134.747 11.860

17746 rows × 2 columns

Merging the datasets together#

Hide code cell source
C3S=da_df
fig, ax = plt.subplots(figsize=(12, 6),dpi=250)
C3S['water_level'].plot(ax=ax, label="C3S")
HYDROMET['water_level'].plot(ax=ax, label="HYDROMET")
ax.grid(True)
ax.legend()
plt.xlabel("Date")
plt.ylabel("Water level [m]")
fig.tight_layout()

plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/initial.png", dpi=250)
plt.show()
Hide code cell output
../../_images/abd73b2efbe4e7e3891259fc983ed84235363fddc1b4e77156e9248cca53ff7d.png
../../_images/9d1e4010-dcda-47ab-9103-beea2c887763.png

Fig. 1.5.1.3 HYDROMET (1948-1996) and C3S (1992-2023) datasets timeseries before correction.#

To obtain the reconstructed dataseries of Lake Victoria’s water levels it is necessary to perform a correction to HYDROMET data based on the overlaping period (1992-1996).

Hide code cell source
# Merge the two time-series, key on date, keep all observations (outer)
d_HYDROMET = HYDROMET.drop(['meas'], axis=1)
d_C3S = C3S.rename_axis('date')
d_C3S.index = pd.to_datetime(d_C3S.index).normalize()
df_merge = pd.merge(d_HYDROMET, d_C3S, how='outer', on='date')
df_merge.columns = ['HYDROMET', 'C3S']
df_merge.index = df_merge.index.normalize()

# Get only the overlapping observations
df_overlap = df_merge.query('HYDROMET == HYDROMET & C3S == C3S')
df_overlap.head() # 137 rows [1992-09-28 : 1996-07-18] 
HYDROMET C3S
date
1992-09-28 1134.567 1135.32
1992-10-18 1134.572 1135.30
1992-11-15 1134.597 1135.36
1992-12-16 1134.672 1135.43
1993-01-14 1134.767 1135.54
Hide code cell source
avg_diff = (df_overlap['C3S'] - df_overlap['HYDROMET']).mean() # C3S overestimates vs. HYDROMET
n_diff = len(df_overlap)
std_diff = (df_overlap['C3S'] - df_overlap['HYDROMET']).std()
print('avg_diff =', avg_diff,  'n_diff =', n_diff, 'std_diff =', std_diff)
print('avg_diff_r =', round(avg_diff,3), 'std_diff_r =', round(std_diff,3) )
# DATHI2022 data: Avg diff is 43.2 cm +/- 5.24 cm (one std)
# C3S2023 data: Avg diff is 77.8 cm +/- 3 cm (one std)
avg_diff = 0.7777871747927428 n_diff = 47 std_diff = 0.02967133397421761
avg_diff_r = 0.778 std_diff_r = 0.03

The mean difference between the two datasets is 77.8 ± 3.0 cm, while the average offset between HYDROMET and DAHITI was 43.2 ± 5.24 cm [1].

Hide code cell source
# Snap the two timeseries together, add avg diff to HYDROMET 
HYDROMET_corr = HYDROMET.copy()
HYDROMET_corr['water_level'] = HYDROMET['water_level'] + round(avg_diff,2)
Hide code cell source
fig, axes = plt.subplots(1, 2, figsize=(16, 6), dpi=250)

# === Left Plot: Full time series ===
HYDROMET_corr['water_level'].plot(ax=axes[0], label="HYDROMET")
C3S['water_level'].plot(ax=axes[0], label="C3S")

min_date = min(HYDROMET_corr.index.min(), C3S.index.min())
max_date = max(HYDROMET_corr.index.max(), C3S.index.max())
axes[0].set_xlim([min_date, max_date])
axes[0].set_title("Full Time Series")
axes[0].set_ylabel("Water level [m]")
axes[0].set_xlabel("Date")
axes[0].legend()
axes[0].grid(True)

# === Right Plot: Zoomed section ===
C3S['water_level'].plot(ax=axes[1], label="C3S")
HYDROMET_corr['water_level'].plot(ax=axes[1], label="HYDROMET")

axes[1].set_xlim(['1992-01-01', '1997-01-10'])
axes[1].set_title("Zoomed View: 1992–1997")
axes[1].set_xlabel("Date")
axes[1].legend()
axes[1].grid(True)
axes[1].text(0.5, 0.8, "C3S 2023 data \nAvg diff with HYDROMET is 77.8 cm ± 3 cm",
             transform=axes[1].transAxes,
             ha='center', va='center')

# === Layout & Save ===
fig.tight_layout()
plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/corrected_combined.png", dpi=250)
plt.show()
Hide code cell output
../../_images/a6c67839726245aebecce6fc9b0509b1e5d274ef991399c7892d51e00f430a6d.png
../../_images/19d08709-ecfe-4d73-aa3e-0170ad23724d.png

Fig. 1.5.1.4 HYDROMET (1948-1996) and C3S (1992-2023) datasets timeseries after correction.#

A single dataset of with both sourceds was created. The when C3S LWL-v5.0 was available it was replaced and the gaps in HYDROMET filled by interpolation

Hide code cell source
#CORRECT FOR DIFFERENCE
startHYDROMET = np.datetime64('1948-01-01')
endHYDROMET = np.datetime64('1996-08-01')

startC3S = np.datetime64('1992-09-28')
endC3S = np.datetime64('2023-12-27') 

HYDROMET_keep = HYDROMET_corr.loc[ startHYDROMET : (startC3S - pd.Timedelta("1 day")) ]
HYDROMET_keep = HYDROMET_keep.drop(columns=['meas'])

# Overwriting time-series with C3S where available
Hide code cell source
#To have only one valule per day 
C3S_keep = C3S.rename_axis('date')
C3S_keep = C3S_keep.resample('D').mean().dropna()
# Make only one list with HYDROMET+C3S data
df_list = [HYDROMET_keep, C3S_keep]
lakelevels_all_raw = pd.concat(df_list, ignore_index=False)
pd.set_option("display.precision", 8)
Hide code cell source
# Round to 2 sig figs 
lakelevels_all = lakelevels_all_raw.round(2)
pd.set_option("display.precision", 8)
Hide code cell source
meanL = lakelevels_all['water_level'].mean()
maxL = lakelevels_all['water_level'].max()
minL = lakelevels_all['water_level'].min()
mtrim = stats.trim_mean(lakelevels_all[['water_level']], 0.1)
mtrim = mtrim.flatten().tolist()[0]
p10 = scipy.stats.scoreatpercentile(lakelevels_all[['water_level']], 10) # 10th percentile 
p90 = scipy.stats.scoreatpercentile(lakelevels_all[['water_level']], 90) # 90th percentile 
Hide code cell source
# Interpolate to daily res and round to 2 sig figs
lakelevels_intr = lakelevels_all.resample('D').asfreq().interpolate(method='linear').round(2)
Hide code cell source
left = '1948-01-01'
right = '2023-12-27'

fig, ax = plt.subplots(figsize=(12, 6), dpi=250)
lakelevels_intr['water_level'].plot(ax=ax, label="HYDROMET and C3S merged")
ax.grid(True)
ax.legend()
plt.xlabel("Date")
plt.ylabel("Water level [m]")
ax.set_xlim(left, right)
plot_text = ('mean: {:.2f}, max {:.2f}, min {:.2f}, range {:.2f}, \n \n mtrim {:.2f}, p10 {:.2f}, p90 {:.2f} [m] \n \n start date: {}, end date: {} \n \n HM 1948-1992, C3S 1992-2023'
             .format(meanL, maxL, minL, maxL-minL, mtrim, p10,p90, left, right))  #, start date {:.3f}, end date {:.3f} 
ax.text(0.5, 0.2, (plot_text),
        horizontalalignment='center',
        verticalalignment='center',
        transform=ax.transAxes, 
        bbox={'facecolor':'white', 'alpha':0.5, 'pad':10})
ax.xaxis.set_major_locator(mdates.YearLocator(5))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
fig.autofmt_xdate()
fig.tight_layout()
plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/reconstructed.png", dpi=250)
plt.show()
Hide code cell output
../../_images/fcedbfbd7588bdd99e48b324dcffbb4527cd21a6bdf3018e2085258706a33a6d.png
../../_images/dfd3858a-2054-4ce8-b334-2e2451c1b6b4.png

Fig. 1.5.1.5 Reconstructed Lake Victoria’s water levels timeseries (1948-2023).#

4. Probabilistic extreme event analysis#

Pietroiusti et al. (2024) [1] assessed the influence of anthropogenic climate change on the 2020 Lake Victoria floods using the extreme event attribution framework outlined by Philip et al. (2020) [5] and van Oldenborgh et al. (2021) [6]. The methodology follows a structured sequence of steps: (i) defining the event, (ii) estimating probabilities and trends based on observational data, (iii) validating climate models, (iv) conducting attribution using multiple models and methods, and (v) synthesizing the findings into clear attribution statements.

In this assessment, only the first step, event definition, is performed, using reconstructed Lake Victoria water level time series from the HYDROMET dataset and the C3S LWL-v5.0 product. The aim is to assess whether the C3S LWL-v5.0 dataset is suitable for use in attribution analyses of this kind. While the remaining steps involve model simulations and additional data sources that are beyond the scope of this work, the assumption is that if the observational component is consistent, the full attribution framework could, in principle, be applied using this dataset as well.

Event definition#

The varaible chosen for analysis is rate of change in water levels (ΔL) over a tme-window (Δt). The Δt chosen was 180 days since most of the level rise in the flooding event of 2020 in Lake Victoria ocurred in the 6 months period between November 2019 and May 2020 [1].

The annual block maxima was calculated to retain the maximum ΔL over a 180 days ocurred in each year.

Hide code cell source
# === Step 1: Set the time window (dt) ===
dt = 180  # days
lakelevels_intr[f'dLdt_{dt}'] = lakelevels_intr['water_level'] - lakelevels_intr['water_level'].shift(dt)

# === Step 2: Extract annual block maxima ===
df_block = lakelevels_intr[[f'dLdt_{dt}']].copy()
df_block['year'] = df_block.index.year

# Keep only years with full data
all_years = df_block['year'].unique()
if df_block[df_block['year'] == all_years[-1]].shape[0] < 365:
    valid_years = all_years[1:-1]
else:
    valid_years = all_years[1:]

records = []
for year in valid_years:
    year_data = df_block[df_block['year'] == year]
    max_val = year_data[f'dLdt_{dt}'].max()
    max_day = year_data[year_data[f'dLdt_{dt}'] == max_val].index[0]
    records.append((year, max_val, max_day))

df_max = pd.DataFrame(records, columns=['year', f'dLdt_{dt}', 'date']).set_index('year')

# === Step 3: Calculate rank of each year ===
df_max['rank'] = df_max[f'dLdt_{dt}'].rank(ascending=False).astype(int)

# === Step 4: Print 2020 event details ===
if 2020 in df_max.index:
    val = df_max.loc[2020, f'dLdt_{dt}']
    rank = df_max.loc[2020, 'rank']
    print(f"🔹 2020 ΔL/Δt (180 days): {val:.3f} m")
    print(f"🔹 2020 Rank: {rank} out of {len(df_max)} years")
else:
    print("2020 is not in the index. Check your data coverage.")
🔹 2020 ΔL/Δt (180 days): 1.210 m
🔹 2020 Rank: 3 out of 74 years
Hide code cell source
# Ensure rolling mean exists
df_max['rolling_mean'] = df_max[f'dLdt_{dt}'].rolling(window=10, center=True, min_periods=1).mean()

# Get top 3 values
top3 = df_max.sort_values(by=f'dLdt_{dt}', ascending=False).head(3)

# === Plot ===
plt.figure(figsize=(10, 6), dpi=250)

# Step plot for annual maxima (red)
plt.step(df_max.index, df_max[f'dLdt_{dt}'], where='mid', color='red', linewidth=1,label=f'Annual max (Δt = {dt} days)')

# Rolling mean (green)
plt.plot(df_max.index, df_max['rolling_mean'], color='lime', linewidth=1, label='10-year rolling mean')

# Top 3 events (black markers + labels)
for idx, row in top3.iterrows():
    plt.scatter(idx, row[f'dLdt_{dt}'], color='black', zorder=5)
    plt.text(idx, row[f'dLdt_{dt}'] + 0.02, f"{row['date'].year}", ha='center', fontsize=9, color='black')

# Labels, title, grid
plt.title(f'Annual Block Maxima of ΔL/Δt (Δt = {dt} days), 1948–2021')
plt.xlabel('Year')
plt.ylabel(f'ΔL/Δt max (m)')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.savefig("/data/wp5/trigoso_camila/q01/Water Level/fig/anual_block_maxima.png", dpi=250)
plt.show()
Hide code cell output
../../_images/004ee4728001b5ef88c63419b06c90d5c7776d3d1a5d561fe9135e1aee4308a8.png
../../_images/5fede85b-25bb-4126-9f51-b1050a26ae3a.png

Fig. 1.5.1.6 Annual block maxima time series \((\Delta L / \Delta t)_{\text{max}}\) with Δt=180 d for the period 1897–2021 and 10-year rolling mean of the time series (using C3S-LWL v5.0 data).#

../../_images/e4f5e3f5-af36-4613-9df5-03a4f99fde20.png

Fig. 1.5.1.7 Annual block maxima time series \((\Delta L / \Delta t)_{\text{max}}\) with Δt=180 d for the period 1897–2021 and 10-year rolling mean of the time series (using DAHITI data). Source: Pietroiusti et al. (2022) [7]#

The results of this analysis are consistent with those reported by Pietroiusti et al. (2024) [1]. According to the C3S-LWL v5.0 dataset, the year 2020 recorded the third-largest 180-day lake level rise, surpassed only by 1962 and 1998. The magnitude of the 2020 rise was 1.21 meters, which matches the value obtained by Pietroiusti et al. using the DAHITI dataset.

However, some discrepancies emerge when comparing other years. For instance, in 2000, the C3S-LWL v5.0 data shows a negative value, indicating a consistent decrease in lake levels throughout the year (see Fig. 1.5.1.6). Conversely, the corresponding figure in Pietroiusti et al. (2022) [7] (see Fig. 1.5.1.7) shows a slightly positive value.

These differences likely stem from variations in the underlying datasets, as discussed in 2. C3S comparison DAHITI. In 2000, TOPEX/Poseidon was still the only mission providing data for the C3S-LWL, with only one value per month. Meanwhile, DAHITI applied Kalman-filtered interpolation, resulting in a denser time series. Because no reliable in situ data are available for this period, it is difficult to determine which dataset more accurately reflects reality.

Nevertheless, the agreement between both datasets on the 2020 block maximum strengthens confidence in the reliability of this result. Although C3S-LWL and DAHITI use different processing algorithms, the increased availability of satellite observations in recent years (particularly after 2016) has led to more similar their outputs. With more frequent and higher-quality measurements from multiple missions, the differences between the datasets diminish, making the 1.21 meter increase observed in 2020 not only consistent across sources, but also giving more confidence in the result.

Furthermore, the C3S-LWL v5.0 dataset could potentially be used to support future hydrological analyses and projections [9], particularly in regions where in situ data are limited or unavailable

ℹ️ If you want to know more#

Key resources#

Code libraries used:

Dataset documentation:

References#

[1] Pietroiusti, R., Vanderkelen, I., Otto, F. E. L., Barnes, C., Temple, L., Akurut, M., Bally, P., van Lipzig, N. P. M., and Thiery, W. (2024). Possible role of anthropogenic climate change in the record-breaking 2020 Lake Victoria levels and floods, Earth Syst. Dynam., 15, 225–264.

[2] Schwatke, C., Dettmering, D., Bosch, W., and Seitz, F. (2015). DAHITI - an innovative approach for estimating water level time series over inland waters using multi-mission satellite altimetry: , Hydrol. Earth Syst. Sci., 19, 4345-4364.

[3] WMO-UNPD (1974). Hydrometeorological Survey of the Catchments of Lakes Victoria, Kyoga and Albert: Vol 1 Meteorology and Hydrology of the Basin.

[4] Vanderkelen, I., Van Lipzig, N. P., and Thiery, W. (2018). Modelling the water balance of Lake Victoria (East Africa)-Part 1: Observational analysis. Hydrology and Earth System Sciences, 22(10):5509–5525.

[5] Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., and van Aalst, M. (2020). A protocol for probabilistic extreme event attribution analyses, Adv. Stat. Clim. Meteorol. Oceanogr., 6, 177–203.

[6] G. J. van Oldenborgh, K. van der Wiel, S. Kew, S. Philip, F. Otto, R. Vautard, A. King, F. Lott, J. Arrighi, R. Singh, and M. van Aalst. Pathways and pitfalls in extreme event attribution. Climatic Change, vol. 166, no. 1, p. 13, 2021.

[7] Pietroiusti, R., Vanderkelen, I., van Lipzig, N. P. M., and Thiery, W. (2022). Was the 2020 Lake Victoria flooding ‘caused’ by anthropogenic climate change? An event attribution study. M.Sc. thesis, Dept. of Hydrology and Climate, Vrije Universiteit Brussel and KU Leuven.

[8] Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics. Springer-Verlag London. ISBN: 978-1-85233-459-8.

[9] Vanderkelen, I., van Lipzig, N. P. M., and Thiery, W. (2018). Modelling the water balance of Lake Victoria (East Africa) – Part 2: Future projections, Hydrol. Earth Syst. Sci., 22, 5527–5549.