logo

1.4.4. Satellite Land Cover trend assessment for Spatial Planning and Land Management#

Production date: 01-07-2024

Produced by: Inês Girão (+ATLANTIC)

❓ Quality assessment question#

  • Is the dataset accurate and consistent for the analysis of urbanisation trends in the Iberian Peninsula?

Land Cover data is an invaluable resource for a wide range of fields, from climate change research to urban planning. Land Cover products that provide historical timelines enable scientists, policymakers, and planners to understand and analyse the transformation of land cover over recent decades (Vargo et al, 2013; Chang et al., 2018).

This notebook will access the Land cover classification gridded maps from 1992 to present derived from satellite observations (henceforth, LC) data from the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S), and analyse the spatial patterns of the LC over a given Area of Interest (AoI) and time.

📢 Quality assessment statement#

These are the key outcomes of this assessment

The dataset maintains strong temporal continuity, with annual updates ensuring a smooth and reliable representation of land cover changes over time. While breakpoints were identified, they generally did not indicate major disruptions, reinforcing the dataset’s stability for long-term trend analysis.

The presence of breakpoints does not necessarily indicate abrupt landscape shifts but rather highlights the sensitivity of detection methods to gradual changes. This suggests that while breakpoints can help refine analysis, their impact on overall trends remains limited, emphasising the dataset’s resilience to minor variations.

For the specific land cover type analysed, the dataset exhibits a consistent ability to capture underlying trends. The similarity in results across segmented and total trends suggests that the data structure is well-calibrated, minimising distortions that could arise from classification inconsistencies or methodological biases.

Urbanization_Map_Series.png

📋 Methodology#

This Use Case comprises the following steps:

1. Define the AoI, search and download LC data

2. Inspect and view data for the defined AoI (Iberian Peninsula)

3. Breakpoint Detection

4.Trend Assessment

5. Discussion

📈 Analysis and results#

1. Define the AoI, search and download LC data#

Before we begin we must prepare our environment. This includes installing the Application Programming Interface (API) of the CDS, and importing the various python libraries that we will need.

Install CDS API#

To install the CDS API, run the following command. We use an exclamation mark to pass the command to the shell (not to the Python interpreter). If you already have the CDS API installed, you can skip or comment this step.

Hide code cell source

!pip install cdsapi
Import all the libraries/packages#

We will be working with data in NetCDF format. To best handle this type of data we will use libraries for working with multidimensional arrays, in particular Xarray. We will also need libraries for plotting and viewing data.

Hide code cell source

#  Import Standard Libraries
import random
from collections import defaultdict

#  Import Numerical & Statistical Libraries
import numpy as np
import pandas as pd
import xarray as xr
import dask
from scipy.stats import linregress, f, kendalltau, theilslopes
from statsmodels.api import OLS
from numpy.polynomial.polynomial import Polynomial
import pymannkendall as mk

# import ruptures as rpt
import ruptures as rpt
from scipy.optimize import curve_fit
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
from scipy.signal import savgol_filter

#  Import Geospatial Libraries
import geopandas as gpd  # Import vector geospatial data
import regionmask  # Transform attributes and geometries into specific regions
from shapely.geometry import box, Polygon


#  Import Visualization Libraries
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.colors import TwoSlopeNorm, Normalize
from matplotlib.patches import Patch
from IPython.display import display
from matplotlib.lines import Line2D
from collections import defaultdict
from IPython.display import display, HTML

#  Import External Tools
from c3s_eqc_automatic_quality_control import download, plot, utils


#  Set Matplotlib Style
plt.style.use("seaborn-v0_8-notebook")  # Predefined style for all graphical outputs

# Set the CDSAPI location
import os
os.environ["CDSAPI_RC"] = "/data/wp5/girao_ines/.cdsapirc"
Data Overview#

To search for data, visit the CDS website: http://cds.climate.copernicus.eu Here you can search for ‘Satellite observations’ using the search bar. The data we need for this tutorial is the Land cover classification gridded maps from 1992 to present derived from satellite observations. This catalogue entry provides global Land Cover Classification (LCC) maps with a very high spatial resolution, with a L4 processing level, on an annual basis with a one-year delay, following the Global Climate Observing System (GCOS) convention requirements. LULC maps correspond to a global classification scheme, encompassing 22 classes.

The dataset consists of 2 versions (v2.0.7 produced by the European Space Agency (ESA) Climate Change Initiative (CCI) and v2.1.1 produced by Copernicus Climate Change Service (C3S)).

Data specifications for this use case:

  • Years: 1992 to 2022

  • Version: v2.0.7 before 1992 and v2.1.1 after 2016

  • Format: Zip files

At the end of the download form, select “Show API request”. This will reveal a block of code, which you can simply copy and paste into a cell of your Jupyter Notebook. Having copied the API request, running it will retrieve and download the data you requested into your local directory. However, before you run it, the terms and conditions of this particular dataset need to have been accepted directly at the CDS website. The option to view and accept these conditions is given at the end of the download form, just above the “Show API request” option. In addition, it is also useful to define the time period and AoI parameters and edit the request accordingly, as exemplified in the cells below.

Hide code cell source

#Years to download
years = [1992, 1993, 1994,
        1995, 1996, 1997,
        1998, 1999, 2000,
        2001, 2002, 2003,
        2004, 2005, 2006,
        2007, 2008, 2009,
        2010, 2011, 2012,
        2013, 2014, 2015,
        2016, 2017, 2018,
        2019, 2020, 2021,
        2022, 2022]


# List of requests to retrieve data
collection_id = "satellite-land-cover"
requests = [
    {
        "variable": "all",
        "format": "zip",
        "version": "v2_0_7cds" if year < 2016 else "v2_1_1",
        "year": str(year),
        "area": [45, -10, 35, 4] # Region of interest (Iberian Peninsula)
    }
    for year in years
]

Hide code cell source

# Download and regionalize by AoI
ds = download.download_and_transform(
    collection_id,
    requests,
)
ds = ds.assign_coords(year=ds["time"].dt.year).swap_dims(time="year").drop("time")
 

Hide code cell source

#Inspect the database
ds
<xarray.Dataset> Size: 7GB
Dimensions:              (year: 31, latitude: 3600, longitude: 5040, bounds: 2)
Coordinates:
  * year                 (year) int64 248B 1992 1993 1994 ... 2020 2021 2022
  * latitude             (latitude) float64 29kB 45.0 45.0 44.99 ... 35.0 35.0
  * longitude            (longitude) float64 40kB -9.999 -9.996 ... 3.996 3.999
    lat_bounds           (latitude, bounds, longitude) float64 290MB dask.array<chunksize=(1200, 1, 1680), meta=np.ndarray>
    lon_bounds           (longitude, bounds) float64 81kB dask.array<chunksize=(5040, 2), meta=np.ndarray>
    time_bounds          (year, bounds, longitude) datetime64[ns] 2MB dask.array<chunksize=(1, 2, 5040), meta=np.ndarray>
Dimensions without coordinates: bounds
Data variables:
    lccs_class           (year, latitude, longitude) uint8 562MB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    processed_flag       (year, latitude, longitude) float32 2GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    current_pixel_state  (year, latitude, longitude) float32 2GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    observation_count    (year, latitude, longitude) uint16 1GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    change_count         (year, latitude, longitude) uint8 562MB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    crs                  (year, longitude) int32 625kB dask.array<chunksize=(1, 5040), meta=np.ndarray>
Attributes: (12/38)
    id:                         ESACCI-LC-L4-LCCS-Map-300m-P1Y-1992-v2.0.7cds
    title:                      Land Cover Map of ESA CCI brokered by CDS
    summary:                    This dataset characterizes the land cover of ...
    type:                       ESACCI-LC-L4-LCCS-Map-300m-P1Y
    project:                    Climate Change Initiative - European Space Ag...
    references:                 http://www.esa-landcover-cci.org/
    ...                         ...
    geospatial_lon_max:         180
    spatial_resolution:         300m
    geospatial_lat_units:       degrees_north
    geospatial_lat_resolution:  0.002778
    geospatial_lon_units:       degrees_east
    geospatial_lon_resolution:  0.002778

2. Inspect and view data for the defined AoI (Iberian Peninsula)#

Compute urban area for each NUTS 2 region#

To identify changes in LC patterns, in this user question, NUTS 2 will be used, providing the information regarding the main regions of the Iberian Peninsula.

The NUTS are a hierarchical system divided into 3 levels. NUTS 1 correspond to major socio-economic regions, NUTS 2 correspond to basic regions for the application of regional policies, and NUTS 3 correspond to small regions for specific diagnoses. Additionally a NUTS 0 level, usually co-incident with national boundaries is also available. The NUTS legislation is periodically amended; therefore multiple years are available for download.

The step below masks the Land Cover data according to the NUTS 2 boundaries and calculate the area of each pixel (weighted by the cosine of Latitude). For each NUTS 2, we proceed with the analysis and visual inspection of Land Cover areas per class and corresponding percentages during the elected period.

Mask regions#

First, we need to establish the geometry of the NUTS region (level 2) in order to make the corresponding statistics.

Hide code cell source

# Configure Dask
dask.config.set(scheduler='threads', **{'array.slicing.split_large_chunks': True})

# Define CRS and bounding box for Iberian Peninsula
crs_wgs84 = "EPSG:4326"
iberian_bbox = box(-10, 36, 3.5, 44)

# Load and filter GeoDataFrame
shapefile_url = "https://gisco-services.ec.europa.eu/distribution/v2/nuts/shp/NUTS_RG_20M_2021_4326.shp.zip"
gdf = gpd.read_file(shapefile_url)
gdf = gdf[gdf["LEVL_CODE"] == 2]
gdf = gdf[gdf["NUTS_ID"].str.startswith(("ES", "PT"))]  # Spain (ES) and Portugal (PT)
gdf = gdf.to_crs(crs_wgs84)
gdf = gdf[gdf.intersects(iberian_bbox)]
gdf = gdf[gdf.geometry.is_valid & ~gdf.geometry.is_empty]

if gdf.empty:
    raise ValueError("No valid geometries found after filtering.")

# Ensure dataset CRS is set
if ds.rio.crs is None:
    ds = ds.rio.write_crs("EPSG:4326", inplace=True)

# Ensure dataset coordinates overlap with the filtered regions
filtered_lon_min, filtered_lat_min, filtered_lon_max, filtered_lat_max = gdf.total_bounds
if (
    ds["longitude"].min().values > filtered_lon_max
    or ds["longitude"].max().values < filtered_lon_min
    or ds["latitude"].min().values > filtered_lat_max
    or ds["latitude"].max().values < filtered_lat_min
):
    raise ValueError(
        "Dataset coordinates do not overlap with the filtered regions. Check dataset and region alignment."
    )

# Subset dataset to valid ranges
ds = ds.sel(
    longitude=slice(
        max(filtered_lon_min, ds["longitude"].min().values),
        min(filtered_lon_max, ds["longitude"].max().values)
    ),
    latitude=slice(
        min(filtered_lat_max, ds["latitude"].max().values),
        max(filtered_lat_min, ds["latitude"].min().values)
    )
)

# Check subsetted dataset dimensions
if ds["longitude"].size == 0 or ds["latitude"].size == 0:
    raise ValueError("Subsetting resulted in an empty dataset. Check the bounds and ensure overlap.")

# Create the regionmask
regions = regionmask.from_geopandas(gdf, names="NUTS_NAME")

# Create a 2D mask
mask = regions.mask(ds["longitude"], ds["latitude"])

ds= ds.where(mask.notnull(), drop=True)

Compute cell area#

Then, we can calculate the area of each pixel taking into consideration the curvature of the earth (i.e., weighted by the cosine of Latitude).

Hide code cell source

# Scaling factor for conversion (constant longitude resolution)
scaling_factor = 0.002778  # Degrees
resolution_lon_km = scaling_factor * 111.195  # Longitude resolution in kilometers

# Use the latitude values directly from the dataset
lats = ds["latitude"]

# Calculate the difference between consecutive latitude values
lat_diff = np.abs(np.diff(lats.values))  # Absolute differences
lat_diff = np.append(lat_diff, lat_diff[-1])  # Assume the last cell has the same size as the one before

# Convert latitude differences to kilometers
resolution_lat_km = lat_diff * 111.195  # Latitude-dependent resolution

# Compute the grid cell area for each latitude
grid_cell_area = resolution_lon_km * resolution_lat_km * np.cos(np.radians(lats))

# Assign attributes to the grid cell area
grid_cell_area.attrs = {
    "standard_name": "cell_area",
    "long_name": "Grid cell area",
    "units": "km²",
}

# Add the grid cell area as a coordinate to the dataset
ds = ds.assign_coords(cell_area=("latitude", grid_cell_area.data))  # Use `.data` to extract the array

Select Urban Classes and Prepare Dataset#

Hide code cell source

# Define urban classes
urban_classes = [190]

# Create a mask for urban areas
urban_mask = ds['lccs_class'].isin(urban_classes)

# Masked urban area using cell area
ds_urban_area = ds['cell_area'].where(urban_mask)

# Stack latitude and longitude into a single dimension
ds_urban_area = ds_urban_area.stack(stacked_lat_lon=("latitude", "longitude"))

# Create a stacked mask and align dimensions
stacked_mask = mask.stack(stacked_lat_lon=("latitude", "longitude"))
stacked_mask.name = "stacked_mask"

# Attach the stacked mask to the dataset
ds_urban_area = ds_urban_area.assign_coords(stacked_mask=stacked_mask)

# Group by regions and compute urban area for each year
urban_area_by_region = (
    ds_urban_area.groupby("stacked_mask")
    .sum(dim="stacked_lat_lon")
    .compute()
)

Compute urban area per region and year#

Hide code cell source

results = []
growth = {}

for reg, reg_area in urban_area_by_region.groupby("stacked_mask"):
    if np.isnan(reg):  # Skip NaN regions
        continue

    # Ensure valid region indexing and naming
    region_name = regions[int(reg)].name
    years = ds_urban_area['year'].values
    values = reg_area.values.flatten()  # Flatten in case of higher-dimensional arrays

    # Save growth data by year
    if len(values) > 0:
        growth[region_name] = {int(year): float(value) for year, value in zip(years, values)}
    else:
        growth[region_name] = {}  # Handle regions with no data

    # Add results to the list
    for year, value in zip(years, values):
        results.append({
            "Region": region_name,
            "Year": int(year),
            "Urban_Area": float(value)
        })

# Convert results to a DataFrame
results_df = pd.DataFrame(results)

# Add geometry to results_df
if 'NUTS_NAME' in gdf.columns:
    results_df = pd.merge(results_df, gdf[['NUTS_NAME', 'geometry']], left_on="Region", right_on="NUTS_NAME", how="left")
else:
    print("Warning: NUTS_NAME column not found in gdf.")

# Ensure results_df is a GeoDataFrame
if not isinstance(results_df, gpd.GeoDataFrame):
    results_df = gpd.GeoDataFrame(results_df, geometry=gpd.GeoSeries(results_df['geometry']), crs="EPSG:4326")

Map urban percentage coverage over-time by NUTS regions in the AoI#

Hide code cell source

# Reproject the GeoDataFrame to a projected CRS (EPSG:3035 is a good choice for Europe)
results_df = results_df.to_crs("EPSG:3035")

# Calculate the total area of each region (in square kilometres)
results_df["Region_Total_Area"] = results_df.geometry.area / 10**6  # Convert from m² to km²

# Ensure 'Year' is converted to string if needed
results_df['Year'] = results_df['Year'].astype(str)

#Calculate Urban Percentage
results_df['Urban_Percentage'] = (results_df['Urban_Area'] / results_df['Region_Total_Area']) * 100

# Pivot the data to have years as columns
Urban_pivot = results_df.pivot(index='Region', columns='Year', values='Urban_Percentage').reset_index()

# Extract years for plotting
years = [col for col in Urban_pivot.columns if col.isdigit()]

# Normalize color scale across all maps based on percentage coverage
norm_percentage = Normalize(vmin=0, vmax=results_df['Urban_Percentage'].max())

# Plot settings
n_cols = 3
n_rows = int(np.ceil(len(years) / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 4), squeeze=False)

# Plot each year in a separate subplot
for idx, year in enumerate(years):
    row, col = divmod(idx, n_cols)
    ax = axes[row, col]
    results_df_year = results_df[results_df['Year'] == year]
    results_df_year.plot(column='Urban_Percentage', cmap="Oranges", ax=ax, edgecolor="k", norm=norm_percentage)
    ax.set_title(f"Urban Coverage (%) in {year}", fontsize=10)
    ax.axis('off')

# Hide unused subplots
for idx in range(len(years), n_rows * n_cols):
    row, col = divmod(idx, n_cols)
    axes[row, col].axis('off')

# Step 8: Adjust layout
plt.tight_layout()

# Allocate space for the color bar
fig.subplots_adjust(right=0.85)
cbar_ax = fig.add_axes([0.87, 0.15, 0.03, 0.7])  # Create a new axes for color bar

# Create a single ScalarMappable object for the common colorbar
sm_common = plt.cm.ScalarMappable(cmap="Oranges", norm=norm_percentage)
sm_common._A = []  # Dummy array for the scalar mappable
fig.colorbar(sm_common, cax=cbar_ax, label="Urban Coverage (%)")

plt.show()
../../_images/fedca45efe5c5ee436699f2d839096fe2476fb07ac828aa989bc198610363ed6.png

Map Analysis#

  • Over the 28-year period, all regions maintain a consistent pattern of growth.

  • Regions such as Comunidad Madrid and Área Metropolitana de Lisboa consistently exhibit the highest levels of urban percentage coverage.

  • No negative fluctuations were observed

  • The biggest changes are observed between 2016-2017.

3. Breakpoint Detection#

A first step in assessing temporal behaviour is to analyse the rate of change of the urban-area time series. This provides an intuitive view of how the pace of urban expansion evolves over time and helps highlight periods where the trajectory deviates from its typical pattern.

However, while such exploratory diagnostics identify points of interest, they do not by themselves confirm whether the observed changes are structurally meaningful. Year-to-year variations may reflect noise, smoothing effects, or gradual adjustments in the underlying product. To formally assess structural changes, a statistical breakpoint detection method is applied.

The analysis therefore combines rate-of-change diagnostics with statistical segmentation, with particular attention to the monotonic nature of the data and the need to avoid over-interpreting gradual trends as discrete breakpoints.


Rate of Change Calculation

  • The first derivative of the urban-area time series is computed to quantify the year-to-year change in urban extent.

  • This highlights periods where the pace of urban expansion deviates from its typical behaviour, providing an initial indication of potential anomalies or shifts in the series.


Statistical Thresholding of Derivative Spikes

  • To identify unusually strong deviations, a dynamic threshold is defined as:

    Threshold = mean(|rate of change|) + 1.5 × standard deviation

  • Years where the absolute rate of change exceeds this threshold are flagged as derivative spikes.

  • These spikes represent periods where the magnitude of change is unusually large relative to the typical variability of the series.

  • This step serves as a diagnostic indicator of anomalous growth behaviour, but does not in itself define structural breakpoints.


Breakpoint Detection Using the PELT Algorithm

  • The Pruned Exact Linear Time (PELT) algorithm is used to detect changes in the statistical behaviour of the growth rate, rather than in the cumulative urban-area series itself.

  • This is important because the urban-area time series is monotonic and non-decreasing, meaning that applying segmentation directly to the cumulative values can lead to artificial breakpoints driven by gradual growth.

  • The algorithm is therefore applied to the first derivative (growth rate), allowing detection of:

    • transitions between periods with different rates of urban expansion, rather than absolute levels.

  • The implementation uses:

    • the L2 cost function (model="l2") to minimise variance within segments,

    • a fixed penalty parameter (pen = 2.5) applied to a standardised growth-rate series,

    • and a minimum segment length (min_size = 3) to avoid short, unstable segments.

  • Prior to segmentation, the growth-rate series may be standardised to ensure comparability across regions with different magnitudes of urban change.


Breakpoint Selection

  • Breakpoints are identified directly from the segmentation of the growth-rate series.

  • Unlike earlier approaches, no explicit filtering based on derivative spikes is applied, as this was found to introduce additional assumptions and potentially remove meaningful structural changes.

  • Instead, derivative spikes and breakpoints are analysed in parallel, allowing independent but complementary perspectives on the temporal behaviour of the series.


Methodological Considerations

  • The urban-area time series is strictly monotonic and cumulative by construction, reflecting constraints in the land-cover product that prevent decreases in urban extent.

  • As a result:

    • segmentation applied to the raw series tends to produce regularly spaced breakpoints, even in the absence of true structural changes.

    • applying breakpoint detection to the growth rate mitigates this issue by focusing on changes in dynamics rather than accumulated values.

  • Even when applied to growth rates, breakpoint detection should be interpreted cautiously:

    • some detected changes may reflect gradual adjustments in processing or classification, rather than abrupt real-world events.

    • others may correspond to genuine shifts in the pace of urban expansion.

  • For this reason, breakpoint detection is treated as a diagnostic tool, not as definitive evidence of discrete events.


Interpreting Breakpoint Timing

Breakpoint positions should be interpreted as approximate indicators of when a change in growth dynamics begins, rather than exact points of maximum change.

Because segmentation identifies changes in statistical behaviour, the detected breakpoint may:

  • precede the most visible increase in growth rate, or

  • occur within a broader transition period.

As a result, breakpoint timing should be considered indicative rather than exact, and interpreted in conjunction with derivative-based diagnostics.

Hide code cell source

# Calculate breakpoints and derivative spikes for each region

processing_phases = [
    {"label": "AVHRR", "start": 1992, "end": 1999},
    {"label": "Baseline MERIS", "start": 2003, "end": 2012},
    {"label": "SPOT-VGT", "start": 1999, "end": 2013},
    {"label": "PROBA-V", "start": 2013, "end": 2019},
    {"label": "Sentinel-3", "start": 2020, "end": 2022},
]

phase_colors = {
    "AVHRR": "#e6f2ff",
    "Baseline MERIS": "#a07197",
    "SPOT-VGT": "#b3d4b3",
    "PROBA-V": "#fff5e6",
    "Sentinel-3": "#f2e6ff",
}

ghsl_epochs = [
    {"label": "GHSL 1990 epoch", "start": 1992, "end": 1999},
    {"label": "GHSL 2000 transition", "start": 1999, "end": 2003},
    {"label": "GHSL 2000+ epoch", "start": 2003, "end": 2022},
]

# GHSL change years shown as hard-line transitions
ghsl_change_years = [1999, 2003]

# Midpoint used for before/after GHSL switch classification
ghsl_switch_year = 1999.5

# Number of subplot columns in the regional figure layout
n_cols = 3

# Threshold multiplier used to flag unusually large derivative spikes:
# spike if |growth rate| > mean(|growth rate|) + spike_std_factor * std(growth rate)
spike_std_factor = 1.5

# Minimum number of derivative observations allowed in each PELT segment.
pelt_min_size = 1

# Penalty term for PELT breakpoint detection.
pelt_penalty = 2.5

# Standardize the growth-rate series before applying PELT so that breakpoint
# detection is comparable across regions with different magnitudes of urban change.
standardize_growth = True


# Functions
def assign_label(x, windows, default="Unknown"):
    for w in windows:
        if w["start"] <= x < w["end"] or np.isclose(x, w["end"]):
            return w["label"]
    return default

def interval_label(y0, y1):
    return f"{int(y0)}-{int(y1)}"

def monotonicity_metrics(series):
    diffs = np.diff(series)
    if len(diffs) == 0:
        return {
            "Monotonic_NonDecreasing": True,
            "Num_Decreases": 0,
            "Positive_Ratio": np.nan,
            "Negative_Ratio": np.nan,
            "Zero_Ratio": np.nan,
        }

    return {
        "Monotonic_NonDecreasing": bool(np.all(diffs >= 0)),
        "Num_Decreases": int(np.sum(diffs < 0)),
        "Positive_Ratio": np.mean(diffs > 0),
        "Negative_Ratio": np.mean(diffs < 0),
        "Zero_Ratio": np.mean(diffs == 0),
    }

def compute_growth(years, values):
    if len(years) <= 1:
        return np.array([]), np.array([])
    growth = np.diff(values) / np.diff(years)
    growth_years = (years[:-1] + years[1:]) / 2
    return growth, growth_years

def detect_spikes(growth, spike_std_factor=1.5):
    if len(growth) == 0:
        return np.array([], dtype=int)
    threshold = np.mean(np.abs(growth)) + spike_std_factor * np.std(growth)
    return np.where(np.abs(growth) > threshold)[0]

def prepare_for_pelt(growth, standardize=True):
    if len(growth) == 0:
        return growth
    if standardize and np.std(growth) > 0:
        return (growth - np.mean(growth)) / np.std(growth)
    return growth.copy()

def detect_breakpoints(growth, min_size=3, penalty=2.5, standardize=True):
    if len(growth) < min_size + 1:
        return np.array([], dtype=int)

    growth_for_pelt = prepare_for_pelt(growth, standardize=standardize)
    algo = rpt.Pelt(model="l2", min_size=min_size).fit(growth_for_pelt.reshape(-1, 1))
    detected = np.array(algo.predict(pen=penalty)) - 1

    # remove invalid and terminal interval
    return np.array([bp for bp in detected if 0 <= bp < len(growth) - 1], dtype=int)

def make_event_record(region, years, positions, idx, extra=None):
    pos = positions[idx]
    record = {
        "Region": region,
        "Year": float(pos),
        "Interval": interval_label(years[idx], years[idx + 1]),
        "Position": int(idx),
        "Processing_Phase": assign_label(pos, processing_phases),
        "GHSL_Epoch": assign_label(pos, ghsl_epochs),
        "GHSL_Switch_Side": (
            "Before GHSL 2000 switch" if pos < ghsl_switch_year else "After GHSL 2000 switch"
        ),
    }
    if extra:
        record.update(extra)
    return record

def update_counts(record, interval_counts, phase_counts, ghsl_counts, prepost_counts):
    interval_counts[record["Interval"]] += 1
    phase_counts[record["Processing_Phase"]] += 1
    ghsl_counts[record["GHSL_Epoch"]] += 1
    prepost_counts[record["GHSL_Switch_Side"]] += 1

# ==================================================
# DATA PREP
# ==================================================

unique_regions = results_df["Region"].unique()
n_regions = len(unique_regions)

global_y_min = results_df["Urban_Area"].min()
global_y_max = results_df["Urban_Area"].max()

global_derivative_min = float("inf")
global_derivative_max = float("-inf")

for region in unique_regions:
    region_data = results_df[results_df["Region"] == region].sort_values("Year")
    years = region_data["Year"].to_numpy(dtype=float)
    urban_area = region_data["Urban_Area"].to_numpy(dtype=float)

    growth, _ = compute_growth(years, urban_area)
    if len(growth) > 0:
        global_derivative_min = min(global_derivative_min, np.min(growth))
        global_derivative_max = max(global_derivative_max, np.max(growth))

if not np.isfinite(global_derivative_min) or not np.isfinite(global_derivative_max):
    global_derivative_min, global_derivative_max = -1, 1

# ==================================================
# STORAGE
# ==================================================

breakpoint_records = []
spike_records = []
monotonicity_records = []

interval_breakpoint_counts = defaultdict(int)
interval_spike_counts = defaultdict(int)

phase_breakpoint_counts = defaultdict(int)
phase_spike_counts = defaultdict(int)

ghsl_breakpoint_counts = defaultdict(int)
ghsl_spike_counts = defaultdict(int)

prepost_breakpoint_counts = defaultdict(int)
prepost_spike_counts = defaultdict(int)

Hide code cell source

# Plot Breakpoints Series
n_rows = int(np.ceil(n_regions / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 8, n_rows * 5), squeeze=False)
axes = axes.flatten()

for idx, region in enumerate(unique_regions):
    ax = axes[idx]

    region_data = results_df[results_df["Region"] == region].sort_values("Year")
    years = region_data["Year"].to_numpy(dtype=float)
    urban_area = region_data["Urban_Area"].to_numpy(dtype=float)

    mono = monotonicity_metrics(urban_area)
    monotonicity_records.append({"Region": region, **mono})

    growth, growth_years = compute_growth(years, urban_area)
    spike_idx = detect_spikes(growth, spike_std_factor=spike_std_factor)
    breakpoint_idx = detect_breakpoints(
        growth,
        min_size=pelt_min_size,
        penalty=pelt_penalty,
        standardize=standardize_growth
    )

    spike_positions = growth_years[spike_idx] if len(spike_idx) else np.array([])
    breakpoint_positions = growth_years[breakpoint_idx] if len(breakpoint_idx) else np.array([])

    # Store spikes
    for s in spike_idx:
        rec = make_event_record(
            region, years, growth_years, s,
            extra={"Derivative": growth[s]}
        )
        spike_records.append(rec)
        update_counts(
            rec,
            interval_spike_counts,
            phase_spike_counts,
            ghsl_spike_counts,
            prepost_spike_counts
        )

    # Store breakpoints
    for bp in breakpoint_idx:
        rec = make_event_record(region, years, growth_years, bp)
        breakpoint_records.append(rec)
        update_counts(
            rec,
            interval_breakpoint_counts,
            phase_breakpoint_counts,
            ghsl_breakpoint_counts,
            prepost_breakpoint_counts
        )

    # Background phases
    for phase in processing_phases:
        ax.axvspan(
            phase["start"],
            phase["end"],
            color=phase_colors.get(phase["label"], "lightgrey"),
            alpha=0.35,
            zorder=0
        )

    # Main series
    ax.plot(years, urban_area, marker="o", color="grey", zorder=2)
    ax.set_ylabel("Urban Area (km²)", color="grey")
    ax.tick_params(axis="y", colors="grey")
    ax.set_ylim(global_y_min, global_y_max)

    # GHSL change years as hard lines
    for ghsl_year in ghsl_change_years:
        ax.axvline(
            ghsl_year,
            color="dodgerblue",
            linestyle="-",
            linewidth=2.2,
            zorder=3
        )

    # Breakpoints
    for x in breakpoint_positions:
        ax.axvline(
            x,
            color="red",
            linestyle="-",
            linewidth=2.5,
            zorder=4
        )

    # Spikes
    for x in spike_positions:
        ax.axvline(
            x,
            color="gold",
            linestyle=":",
            linewidth=2.5,
            zorder=5
        )

    # Growth rate
    ax2 = ax.twinx()
    ax2.plot(
        growth_years,
        growth,
        marker="o",
        linestyle="dotted",
        color="black",
        zorder=6
    )
    ax2.set_ylabel("Rate of Change", color="black")
    ax2.tick_params(axis="y", colors="black")
    ax2.set_ylim(global_derivative_min, global_derivative_max)

    # Formatting
    ax.set_xticks(years)
    ax.set_xticklabels(years.astype(int), rotation=90)
    ax.set_title(region)
    ax.set_xlabel("Year")
    ax.grid()

# Remove empty axes
for idx in range(n_regions, len(axes)):
    fig.delaxes(axes[idx])


# Processing phases legend
phase_handles = [
    Patch(
        facecolor=phase_colors[p["label"]],
        edgecolor="black",
        alpha=0.5,
        label=p["label"]
    )
    for p in processing_phases
]

legend_phases = fig.legend(
    handles=phase_handles,
    loc="upper center",
    bbox_to_anchor=(0.5, 0.992),
    ncol=len(processing_phases),
    frameon=True,
    fontsize=9,
    title_fontsize=10
)

# Line/signal legend
line_handles = [
    Line2D([0], [0], color="grey", marker="o", linestyle="-", label="Urban Area"),
    Line2D([0], [0], color="black", marker="o", linestyle="dotted", label="Rate of Change"),
    Line2D([0], [0], color="dodgerblue", linestyle="-", linewidth=2.2, label="GHSL change year"),
    Line2D([0], [0], color="red", linestyle="-", linewidth=2.5, label="Breakpoint"),
    Line2D([0], [0], color="gold", linestyle=":", linewidth=2.5, label="Derivative spike"),
]

legend_lines = fig.legend(
    handles=line_handles,
    loc="upper center",
    bbox_to_anchor=(0.5, 0.98),
    ncol=len(line_handles),
    frameon=True,
    fontsize=9,
    title_fontsize=10
)

# Compact spacing between legends and plots
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()
../../_images/cc1a90e10c3b0930d1d4e2e09b40544e8c5c2089f2d3ef19972ee2ff31b57ae1.png

To assess whether detected discontinuities in the urban-area time series may be influenced by methodological factors, a targeted alignment analysis is performed. The aim is to evaluate whether growth-rate breakpoints and derivative spikes occur shortly after known processing-chain changes.

For each detected event, the time difference relative to each processing-change year is computed. An event is considered potentially related to a processing change only if it occurs within the two years following that transition. This forward-looking window reflects the temporal characteristics of the dataset, where land-cover changes are only confirmed if they persist over two consecutive years.

In addition to temporal alignment, the analysis also considers the spatial consistency of detected signals, quantified as the number of regions affected within a given interval. Methodological effects are expected to influence the dataset in a systematic way, and therefore to appear consistently across multiple regions. In contrast, signals affecting only a small number of regions are more likely to reflect local dynamics or region-specific variations rather than processing-related artefacts.

The results are summarised in four complementary tables, distinguishing between:

  • Event type: breakpoints vs derivative spikes

  • Analytical scope: aligned vs full set of identified breakpoints and derivative spikes

The first two tables report only events that occur within two years after a processing transition, while the two additional tables report all detected events.

Hide code cell source

# Plot the aligned breakpoints and derivative spikes with known methodological changes present in more than half of the regions

sensors_timeline = {
    "GHSL_switch": [2000, 2003],
    "AVHRR_finish": [1999],
    "SPOT-VGT_start": [1999],
    "SPOT-VGT_finish": [2013],
    "MERIS_baseline_start": [2003],
    "MERIS_baseline_finish": [2012],
    "PROBAV_start": [2013],
    "S3_OLCI_start": [2020],
    "S3_OLCI_SLSTR_start": [2021],
}

forward_window = 2.0
min_regions = 10  # threshold for strong/systematic signal

breakpoint_df = pd.DataFrame(breakpoint_records)
spike_df = pd.DataFrame(spike_records)

def transition_interval_label(year, forward_window=2.0):
    start = int(year)
    end = int(year + forward_window)
    return f"{start}-{end}"

def summarize_by_transition_table(df, forward_window=2.0, min_regions=10):
    if df.empty:
        return pd.DataFrame(), False

    results = []
    any_aligned = False

    for transition_name, years in sensors_timeline.items():
        for transition_year in years:

            tmp = df.copy()
            tmp["Years_After"] = tmp["Year"] - transition_year

            aligned = tmp[
                (tmp["Years_After"] > 0) &
                (tmp["Years_After"] <= forward_window)
            ].copy()

            if aligned.empty:
                continue

            any_aligned = True

            regions = sorted(aligned["Region"].unique())

            if len(regions) < min_regions:
                continue

            intervals = sorted(aligned["Interval"].unique())

            results.append({
                "Interval": transition_interval_label(transition_year, forward_window),
                "Aligned signal intervals": ", ".join(intervals),
                "Number of regions": len(regions),
                "Region names": ", ".join(regions),
                "Nearest processing changes": transition_name,
            })

    if not results:
        return pd.DataFrame(), any_aligned

    summary = pd.DataFrame(results)

    summary = summary.sort_values(
        ["Interval", "Number of regions"],
        ascending=[True, False]
    ).reset_index(drop=True)

    return summary, any_aligned

def display_html_table(df):
    html = df.to_html(index=False, escape=False)
    display(HTML(html))

# Build tables
bp_summary, bp_any_aligned = summarize_by_transition_table(
    breakpoint_df,
    forward_window=forward_window,
    min_regions=min_regions
)

spike_summary, spike_any_aligned = summarize_by_transition_table(
    spike_df,
    forward_window=forward_window,
    min_regions=min_regions
)

# Display
print("### Breakpoint alignment table ###")
if bp_summary.empty:
    if bp_any_aligned:
        print(
            f"No breakpoint signals affect at least {min_regions} regions "
            f"within the known methodological changes transition windows"
        )
    else:
        print("No breakpoint signals are temporally aligned with any known methodological changes")
else:
    display_html_table(bp_summary)

print("\n### Derivative spike alignment table ###")
if spike_summary.empty:
    if spike_any_aligned:
        print(
            f"No derivative spike signals affect at least {min_regions} regions "
            f"within the known methodological changes transition windows"
        )
    else:
        print("No derivative spike signals are temporally aligned with any known methodological changes")
else:
    display_html_table(spike_summary)
### Breakpoint alignment table ###
Interval Aligned signal intervals Number of regions Region names Nearest processing changes
2000-2002 2001-2002 12 Alentejo, Algarve, Aragón, Cantabria, Castilla y León, Castilla-La Mancha, Centro (PT), Comunidad Foral de Navarra, Extremadura, Galicia, Illes Balears, La Rioja GHSL_switch
### Derivative spike alignment table ###
No derivative spike signals affect at least 10 regions within the known methodological changes transition windows

Hide code cell source

# Plot all breakpoints and derivative spikes 

def classify_transition_relation(year):
    matches = []
    for name, years in sensors_timeline.items():
        for t in years:
            if 0 < (year - t) <= forward_window:
                matches.append(f"{name} ({t})")
    return "; ".join(matches) if matches else "None within 2 years"


def build_summary(df):
    if df.empty:
        return pd.DataFrame(columns=[
            "intervals",
            "number of regions",
            "regions name",
            "nearest processing changes",
        ])

    summary = (
        df.groupby("Interval")
        .agg(
            **{
                "number of regions": ("Region", "nunique"),
                "regions name": ("Region", lambda x: ", ".join(sorted(set(x)))),
                "mean_year": ("Year", "mean"),
            }
        )
        .reset_index()
    )

    # Rename Interval column
    summary = summary.rename(columns={"Interval": "intervals"})

    # Add relation to processing transition
    summary["nearest processing changes"] = summary["mean_year"].apply(classify_transition_relation)

    # Drop helper column
    summary = summary.drop(columns=["mean_year"])

    # Sort by number of regions, then by interval
    summary = summary.sort_values(
        ["number of regions", "intervals"],
        ascending=[False, True]
    ).reset_index(drop=True)

    return summary


# Build tables
bp_event_summary = build_summary(pd.DataFrame(breakpoint_records))
spike_event_summary = build_summary(pd.DataFrame(spike_records))

# Display nicely
pd.set_option("display.max_colwidth", None)

print("### Breakpoint Table ###")
display(bp_event_summary)

print("### Derivative Spike Table ###")
display(spike_event_summary)
### Breakpoint Table ###
intervals number of regions regions name nearest processing changes
0 2016-2017 13 Cantabria, Castilla y León, Castilla-La Mancha, Cataluña, Centro (PT), Comunidad Foral de Navarra, Comunitat Valenciana, Galicia, Norte, País Vasco, Principado de Asturias, Región de Murcia, Área Metropolitana de Lisboa None within 2 years
1 2001-2002 12 Alentejo, Algarve, Aragón, Cantabria, Castilla y León, Castilla-La Mancha, Centro (PT), Comunidad Foral de Navarra, Extremadura, Galicia, Illes Balears, La Rioja GHSL_switch (2000)
2 2006-2007 6 Alentejo, Algarve, Andalucía, Cataluña, Illes Balears, País Vasco None within 2 years
3 1996-1997 3 Cataluña, Comunitat Valenciana, País Vasco None within 2 years
4 2011-2012 2 Castilla y León, Región de Murcia None within 2 years
### Derivative Spike Table ###
intervals number of regions regions name nearest processing changes
0 2016-2017 8 Andalucía, Castilla y León, Castilla-La Mancha, Comunidad Foral de Navarra, Comunidad de Madrid, Comunitat Valenciana, La Rioja, Región de Murcia None within 2 years
1 2013-2014 7 Andalucía, Castilla y León, Castilla-La Mancha, Centro (PT), Galicia, Principado de Asturias, Región de Murcia SPOT-VGT_finish (2013); MERIS_baseline_finish (2012); PROBAV_start (2013)
2 2012-2013 5 Andalucía, Aragón, Centro (PT), Región de Murcia, Área Metropolitana de Lisboa MERIS_baseline_finish (2012)
3 2001-2002 4 Algarve, Cataluña, Norte, Área Metropolitana de Lisboa GHSL_switch (2000)
4 2002-2003 4 Alentejo, Algarve, Cataluña, Illes Balears None within 2 years
5 2017-2018 4 Andalucía, Aragón, Extremadura, La Rioja None within 2 years
6 2000-2001 2 Cataluña, Área Metropolitana de Lisboa GHSL_switch (2000); AVHRR_finish (1999); SPOT-VGT_start (1999)
7 2003-2004 2 Algarve, Illes Balears GHSL_switch (2003); MERIS_baseline_start (2003)
8 2004-2005 2 Norte, País Vasco GHSL_switch (2003); MERIS_baseline_start (2003)
9 2006-2007 2 Comunidad Foral de Navarra, Extremadura None within 2 years
10 2014-2015 2 Cantabria, Galicia SPOT-VGT_finish (2013); PROBAV_start (2013)
11 2005-2006 1 País Vasco None within 2 years

Breakpoint Analysis#

Breakpoint detection and derivative analysis capture two complementary aspects of temporal dynamics in land cover time series, a distinction widely recognised in change detection literature (Chelali et al., 2019; Chang et al., 2018):

  • Breakpoints identify the onset of structural changes in the time series

  • Derivative spikes highlight years of maximum year-to-year variation

Because of this, spikes may occur slightly after breakpoints, reflecting the difference between initiation and peak expression of change (Chelali et al., 2019).

The interpretation is guided by:

  • temporal alignment with processing-chain transitions, and

  • spatial consistency across regions

This is particularly important in satellite-derived land cover products, where methodological updates (e.g. sensor transitions, classification schemes, or reference layers) can introduce artificial discontinuities in time series.

A signal affecting many regions simultaneously is more likely to reflect a systematic driver (either methodological or large-scale real dynamics), whereas signals affecting few regions are more likely local effects (Chang et al., 2018).


Breakpoint ~2016–2017

This period exhibits the highest spatial consistency, with breakpoints detected across a large majority of regions. The signal is also supported by concurrent derivative spikes, indicating both structural change and increased short-term variability in the time series.

Importantly, this breakpoint cluster shows no temporal alignment with any identified processing-chain transition, ruling out a methodological origin.

Interpretation
This pattern provides strong evidence of a genuine structural shift in urban dynamics. The combination of high spatial coherence and lack of methodological alignment suggests a system-wide driver, consistent with the post-2008 recovery phase, during which urban growth resumed after a period of contraction (Martínez & García, 2015; Palmero-Iniesta et al., 2021).

More broadly, renewed urban expansion in the Iberian Peninsula during the 2010s has been linked to economic recovery, infrastructure investment, and renewed real estate activity (González & Leal, 2018; López et al., 2017). The breakpoint reflects a sustained change in growth trajectory, while the associated spikes indicate short-term adjustments during this transition.


Breakpoint ~2001–2002

A prominent breakpoint cluster is observed in this period, affecting a large number of regions and indicating a highly coherent spatial signal. This interval falls within the transition window associated with the GHSL reference update (1999–2000).

Interpretation
This signal is best understood as a composite effect, arising from the interaction of:

  • Methodological influences, particularly the constraint imposed by changes in reference datasets such as GSHL, which can introduce step-like adjustments in land cover products

  • Underlying urban expansion dynamics, corresponding to the early phase of the Iberian housing boom, characterised by rapid land consumption and development pressure (González & Leal, 2018)

The persistence of the breakpoint signal across regions suggests a structural component; however, its temporal proximity to a major processing transition prevents a clear separation between artefact and real change.


Breakpoint ~2006–2007

This breakpoint cluster affects a moderate number of regions and is not accompanied by strong or synchronised derivative spikes, indicating the absence of abrupt year-to-year changes.

No temporal association is found with processing-chain transitions.

Interpretation
This pattern reflects a progressive structural adjustment rather than a sudden shift. The breakpoint likely captures a change in the underlying growth regime, consistent with the late phase of the Iberian housing expansion cycle (González & Leal, 2018).

The absence of strong derivative spikes suggests that the transition occurred through gradual changes in growth rate, rather than discrete jumps, supporting its interpretation as a real, process-driven signal.


Minor breakpoint clusters

Additional breakpoint detections are observed, each affecting a limited number of regions and lacking both strong spatial coherence and consistent alignment with processing transitions.

Interpretation
These signals are interpreted as localised or context-specific dynamics, rather than manifestations of system-wide processes. Their limited spatial extent and lack of temporal consistency indicate that they do not reflect either:

  • large-scale methodological artefacts

  • coherent regional-scale urban transitions

Instead, they likely correspond to region-specific developments, such as:

4.Trend Assessment#

The next and final step is to determine whether the trend should be computed for the entire period (total trend) or divided into segments based on detected breakpoints.

The decision between using the total trend or segmented trends is based on Sen’s Slope (Trend Magnitude) and Mann-Kendall p-values (Trend Significance). Sen’s Slope is a non-parametric estimator that calculates the median rate of change over time, making it robust to outliers and suitable for detecting monotonic trends. First, Sen’s Slope is computed for the total trend, capturing the overall rate of urban change. Then, Sen’s Slope is calculated for each segmented trend to assess whether breakpoints introduce significant shifts in trend magnitude. The Mann-Kendall test is a non-parametric statistical test used to assess the presence of a monotonic upward or downward trend in a time series without requiring the data to follow any particular distribution. In parallel, the Mann-Kendall p-value is evaluated for both the total and segmented trends to measure trend significance - lower p-values indicate stronger evidence of a significant trend.

The final decision is based on the following approach:

  • If segmented trends show substantially different Sen’s Slopes compared to the total trend and exhibit stronger statistical significance (lower p-values), segmentation is preferred.

  • If segmented trends closely resemble the total trend or do not provide a clear statistical advantage, the total trend is used to maintain a simpler and more robust interpretation.

Hide code cell source

# Prepare data for trend calculation

breakpoints_df = pd.DataFrame(breakpoint_records).copy()
breakpoints_df["Year"] = pd.to_numeric(breakpoints_df["Year"], errors="coerce").round().astype("Int64")

results_df = results_df.copy()
results_df["Year"] = pd.to_numeric(results_df["Year"], errors="coerce")
results_df["Urban_Area"] = pd.to_numeric(results_df["Urban_Area"], errors="coerce")

# Keep only valid rows
results_df = results_df.dropna(subset=["Region", "Year", "Urban_Area"]).copy()
breakpoints_df = breakpoints_df.dropna(subset=["Region", "Year"]).copy()

Hide code cell source

# Calculate trends

def mann_kendall_p(x):
    """Return Mann-Kendall p-value and trend label."""
    if len(x) < 3:
        return np.nan, "no trend"
    res = mk.original_test(x)
    return res.p, res.trend

def significance_stars(p):
    if pd.isna(p):
        return ""
    if p < 0.001:
        return "***"
    elif p < 0.01:
        return "**"
    elif p < 0.05:
        return "*"
    return ""

def significance_traffic_light(p):
    if pd.isna(p):
        return "Unknown"
    if p < 0.05:
        return "Green"
    elif p < 0.10:
        return "Amber"
    return "Red"

def trend_direction(rate_per_decade, p_value, stable_threshold=0.10):
    """
    Classify overall direction.
    stable_threshold is in percentage points of area per decade.
    """
    if pd.isna(rate_per_decade):
        return "Unknown"
    if (not pd.isna(p_value) and p_value >= 0.05) or abs(rate_per_decade) < stable_threshold:
        return "Stable"
    return "Increase" if rate_per_decade > 0 else "Decrease"

def material_segmentation_change(total_direction, total_rate_decade, segment_rows):
    """
    Returns Yes if at least one significant segment changes the conclusion materially.
    Conservative logic:
    - any significant segment with opposite direction to total
    - OR strongest significant segment is >1.5x total magnitude and opposite sign
    """
    if segment_rows.empty:
        return "No"
    
    sig = segment_rows[segment_rows["MK_P_Value"] < 0.05].copy()
    if sig.empty:
        return "No"
    
    for _, r in sig.iterrows():
        seg_dir = r["Direction"]
        if total_direction in ["Increase", "Decrease"] and seg_dir in ["Increase", "Decrease"]:
            if seg_dir != total_direction:
                return "Yes"
    
    return "No"

def confidence_label(n_breakpoints, segmentation_changes, total_p, total_rate_decade):
    """
    Reviewer asks for confidence considering breakpoints and local drivers.
    Since local drivers are not encoded in this table, use a transparent rule:
    - High: significant total trend, no material segmentation change, <=1 breakpoint
    - Medium: significant total trend but segmentation change or multiple breakpoints
    - Low: non-significant total trend or unstable segmented interpretation
    """
    if pd.isna(total_p):
        return "Low"
    if total_p >= 0.05:
        return "Low"
    if segmentation_changes == "No" and n_breakpoints <= 1:
        return "High"
    return "Medium"


# Compute total + segmented trends 

trend_records = []

for region in results_df["Region"].dropna().unique():
    region_data = results_df.loc[results_df["Region"] == region].sort_values("Year").copy()
    X_total = region_data["Year"].values
    y_total = region_data["Urban_Area"].values

    if len(region_data) < 3 or np.isnan(y_total).any():
        continue

    # Total Sen slope
    total_sen_slope, _, _, _ = theilslopes(y_total, X_total)

    # Convert to % area per decade, as requested by reviewer
    mean_total_area = np.mean(y_total)
    total_rate_pct_per_year = (total_sen_slope / mean_total_area) * 100 if mean_total_area != 0 else np.nan
    total_rate_pct_per_decade = total_rate_pct_per_year * 10 if pd.notna(total_rate_pct_per_year) else np.nan

    # Mann-Kendall
    total_p_value, total_mk_trend = mann_kendall_p(y_total)
    total_direction = trend_direction(total_rate_pct_per_decade, total_p_value)

    trend_records.append({
        "Region": region,
        "Type": "Total",
        "Start_Year": int(region_data["Year"].min()),
        "End_Year": int(region_data["Year"].max()),
        "Sen_Slope": round(total_sen_slope, 6),
        "Rate_of_Change_pct_per_year": round(total_rate_pct_per_year, 6) if pd.notna(total_rate_pct_per_year) else np.nan,
        "Rate_of_Change_pct_per_decade": round(total_rate_pct_per_decade, 6) if pd.notna(total_rate_pct_per_decade) else np.nan,
        "MK_P_Value": total_p_value,
        "MK_Trend": total_mk_trend,
        "Direction": total_direction,
        "Significance_Stars": significance_stars(total_p_value),
        "Traffic_Light": significance_traffic_light(total_p_value),
        "Segment_Length": len(region_data)
    })

    # Breakpoints and segmented trends
    region_breakpoints = (
        breakpoints_df.loc[breakpoints_df["Region"] == region, "Year"]
        .dropna()
        .astype(int)
        .tolist()
    )
    bounds = sorted(set([int(region_data["Year"].min())] + region_breakpoints + [int(region_data["Year"].max())]))

    for i in range(len(bounds) - 1):
        start_year, end_year = bounds[i], bounds[i + 1]
        segment_data = region_data[(region_data["Year"] >= start_year) & (region_data["Year"] <= end_year)].copy()

        if len(segment_data) < 3:
            continue

        X_seg = segment_data["Year"].values
        y_seg = segment_data["Urban_Area"].values

        seg_sen_slope, _, _, _ = theilslopes(y_seg, X_seg)

        mean_seg_area = np.mean(y_seg)
        seg_rate_pct_per_year = (seg_sen_slope / mean_seg_area) * 100 if mean_seg_area != 0 else np.nan
        seg_rate_pct_per_decade = seg_rate_pct_per_year * 10 if pd.notna(seg_rate_pct_per_year) else np.nan

        seg_p_value, seg_mk_trend = mann_kendall_p(y_seg)
        seg_direction = trend_direction(seg_rate_pct_per_decade, seg_p_value)

        trend_records.append({
            "Region": region,
            "Type": "Segmented",
            "Start_Year": int(start_year),
            "End_Year": int(end_year),
            "Sen_Slope": round(seg_sen_slope, 6),
            "Rate_of_Change_pct_per_year": round(seg_rate_pct_per_year, 6) if pd.notna(seg_rate_pct_per_year) else np.nan,
            "Rate_of_Change_pct_per_decade": round(seg_rate_pct_per_decade, 6) if pd.notna(seg_rate_pct_per_decade) else np.nan,
            "MK_P_Value": seg_p_value,
            "MK_Trend": seg_mk_trend,
            "Direction": seg_direction,
            "Significance_Stars": significance_stars(seg_p_value),
            "Traffic_Light": significance_traffic_light(seg_p_value),
            "Segment_Length": len(segment_data)
        })

trends_df = pd.DataFrame(trend_records)

The following summary table reports:

  • Growth rate (%/decade) → long-term rate of urban expansion

  • Significance → statistical robustness of the full trend

    • *** → highly significant (p < 0.001)

    • ** → significant (p < 0.01)

    • * → moderately significant (p < 0.05)

    • no symbol → not statistically significant

  • Breakpoints → number of structural changes detected

  • Segmented trend needed? → whether segmented trends change the interpretation

  • Confidence → degree of agreement between full and segmented trend significance

Hide code cell source

# Build summary table

summary_rows = []

latest_year = results_df["Year"].astype(int).max()

# Urban area for each region in the latest year
area_per_region = (
    results_df.loc[results_df["Year"].astype(int) == latest_year, ["Region", "Urban_Area"]]
    .drop_duplicates(subset=["Region"])
    .set_index("Region")["Urban_Area"]
)

def confidence_label(total_p_value, segment_rows):
    """
    Confidence based on agreement between the significance
    of the full trend and the segmented trends.

    Rules:
    - Low: full trend is not significant
    - High: full trend is significant and all segmented trends are significant
    - Medium: full trend is significant but segmented trends are mixed or weaker
    """
    if pd.isna(total_p_value):
        return "Low"

    if total_p_value >= 0.05:
        return "Low"

    if segment_rows.empty:
        return "High"

    valid_segments = segment_rows.dropna(subset=["MK_P_Value"]).copy()

    if valid_segments.empty:
        return "High"

    sig_segments = valid_segments[valid_segments["MK_P_Value"] < 0.05]
    nonsig_segments = valid_segments[valid_segments["MK_P_Value"] >= 0.05]

    if len(sig_segments) == len(valid_segments):
        return "High"

    if len(sig_segments) > 0 and len(nonsig_segments) > 0:
        return "Medium"

    return "Medium"


for region in trends_df["Region"].unique():
    total_row = trends_df[
        (trends_df["Region"] == region) & (trends_df["Type"] == "Total")
    ].iloc[0]

    segment_rows = trends_df[
        (trends_df["Region"] == region) & (trends_df["Type"] == "Segmented")
    ].copy()

    n_breakpoints = breakpoints_df.loc[
        breakpoints_df["Region"] == region, "Year"
    ].nunique()

    seg_changes = material_segmentation_change(
        total_direction=total_row["Direction"],
        total_rate_decade=total_row["Rate_of_Change_pct_per_decade"],
        segment_rows=segment_rows
    )

    confidence = confidence_label(
        total_p_value=total_row["MK_P_Value"],
        segment_rows=segment_rows
    )

    summary_rows.append({
        "Region": region,
        "Total urban area (km²)": area_per_region.get(region, np.nan),
        "Growth rate (%/decade)": total_row["Rate_of_Change_pct_per_decade"],
        "Significance": total_row["Significance_Stars"],
        "Breakpoints": int(n_breakpoints) if pd.notna(n_breakpoints) else 0,
        "Segmented trend needed?": seg_changes,
        "Confidence": confidence
    })

executive_summary_df = pd.DataFrame(summary_rows)

# Sort by growth rate
executive_summary_df = executive_summary_df.sort_values(
    "Growth rate (%/decade)", ascending=False
).reset_index(drop=True)

# Round for display
executive_summary_df["Growth rate (%/decade)"] = executive_summary_df["Growth rate (%/decade)"].round(1)
executive_summary_df["Total urban area (km²)"] = executive_summary_df["Total urban area (km²)"].round(0)

# Normalize text values
executive_summary_df["Segmented trend needed?"] = executive_summary_df["Segmented trend needed?"].map({
    "Yes": "Yes",
    "No": "No"
}).fillna("No")

executive_summary_df["Confidence"] = executive_summary_df["Confidence"].fillna("Low")
executive_summary_df["Significance"] = executive_summary_df["Significance"].fillna("")

n_regions = len(executive_summary_df)
n_seg_no = (executive_summary_df["Segmented trend needed?"] == "No").sum()
n_high = (executive_summary_df["Confidence"] == "High").sum()
n_medium = (executive_summary_df["Confidence"] == "Medium").sum()
n_low = (executive_summary_df["Confidence"] == "Low").sum()

print(
    f"{n_regions} regions were assessed. "
    f"Segmented trends are not required in {n_seg_no} of {n_regions} regions. "
    f"Confidence is High in {n_high} regions, Medium in {n_medium}, and Low in {n_low}."
)


# HTML table display

def color_significance(val):
    if val == "***":
        return "background-color: #e2f0d9; color: #1f4d1f; font-weight: bold;"
    elif val == "**":
        return "background-color: #fff2cc; color: #7f6000; font-weight: bold;"
    elif val == "*":
        return "background-color: #fce5cd; color: #7f3f00; font-weight: bold;"
    return ""

def color_segmented(val):
    if val == "No":
        return "background-color: #e2f0d9; color: #1f4d1f; font-weight: bold;"
    elif val == "Yes":
        return "background-color: #f4cccc; color: #990000; font-weight: bold;"
    return ""

def color_confidence(val):
    if val == "High":
        return "background-color: #d9ead3; color: #1f4d1f; font-weight: bold;"
    elif val == "Medium":
        return "background-color: #fff2cc; color: #7f6000; font-weight: bold;"
    elif val == "Low":
        return "background-color: #f4cccc; color: #990000; font-weight: bold;"
    return ""

def growth_bar(val, vmin, vmax):
    if pd.isna(val):
        return ""
    pct = 100 if vmax == vmin else 100 * (val - vmin) / (vmax - vmin)
    return f"background: linear-gradient(90deg, #9fc5e8 {pct:.1f}%, transparent {pct:.1f}%); text-align: center;"

def display_executive_summary_table(df):
    df_display = df.copy()
    vmin = df_display["Growth rate (%/decade)"].min()
    vmax = df_display["Growth rate (%/decade)"].max()

    def style_row(row):
        styles = []
        for col, val in row.items():
            base = "padding: 6px 8px; border: 1px solid #ccc;"
            if col == "Region":
                base += " text-align: left; font-weight: bold;"
            else:
                base += " text-align: center;"

            if col == "Significance":
                base += color_significance(val)
            elif col == "Segmented trend needed?":
                base += color_segmented(val)
            elif col == "Confidence":
                base += color_confidence(val)
            elif col == "Growth rate (%/decade)":
                base += growth_bar(val, vmin, vmax)

            styles.append(base)
        return styles

    styled = df_display.style.format({
        "Total urban area (km²)": "{:.0f}",
        "Growth rate (%/decade)": "{:.1f}"
    }).apply(style_row, axis=1)

    html = styled.to_html()

    wrapped_html = f"""
    <div style="overflow-x:auto;">
        <style>
            table {{
                border-collapse: collapse;
                width: 100%;
                font-size: 13px;
            }}
            th {{
                background-color: #f2f2f2;
                border: 1px solid #ccc;
                padding: 6px 8px;
                text-align: center;
            }}
        </style>
        {html}
    </div>
    """

    display(HTML(wrapped_html))

display_executive_summary_table(executive_summary_df)
21 regions were assessed. Segmented trends are not required in 21 of 21 regions. Confidence is High in 21 regions, Medium in 0, and Low in 0.
  Region Total urban area (km²) Growth rate (%/decade) Significance Breakpoints Segmented trend needed? Confidence
0 Alentejo 202 57.2 *** 2 No High
1 Algarve 128 50.8 *** 2 No High
2 Aragón 206 49.0 *** 1 No High
3 Centro (PT) 704 48.6 *** 2 No High
4 Comunidad Foral de Navarra 117 48.0 *** 2 No High
5 Región de Murcia 280 47.2 *** 2 No High
6 Castilla y León 534 45.3 *** 3 No High
7 Castilla-La Mancha 527 44.9 *** 2 No High
8 Comunitat Valenciana 1080 44.1 *** 2 No High
9 La Rioja 53 43.5 *** 1 No High
10 Extremadura 223 36.6 *** 1 No High
11 Andalucía 1374 35.4 *** 1 No High
12 Illes Balears 170 33.6 *** 2 No High
13 Comunidad de Madrid 983 33.3 *** 0 No High
14 Cataluña 1279 32.4 *** 3 No High
15 Cantabria 110 31.1 *** 2 No High
16 Galicia 479 30.8 *** 2 No High
17 Norte 1118 29.4 *** 1 No High
18 Área Metropolitana de Lisboa 571 25.5 *** 1 No High
19 Principado de Asturias 166 23.5 *** 1 No High
20 País Vasco 272 21.6 *** 3 No High

Trend Analysis#

The results demonstrates that, although breakpoints were identified, they do not substantially alter the underlying trend estimates. In all cases, the total trend was preferred, as segmented alternatives did not exhibit significantly stronger slopes or lower p-values.

This suggests that the dataset provides a coherent temporal structure for monitoring urban land cover change. While localised variations exist, the broader trend of urban expansion remains largely continuous and statistically stable.

Furthermore, the reliability of the total trend is evident even in regions with marked breakpoint clusters, such as Galicia, Andalucía, and Región de Murcia. Despite temporary changes in slope magnitude, the segmented trends did not offer a statistically significant improvement over the total series, suggesting that short-term accelerations or decelerations are well-integrated into the full-period estimate.

Geographical characteristics modulate this effect. Larger metropolitan areas or regions with long-standing urban infrastructure exhibit smoother temporal evolution, while smaller or emerging urban zones may reflect sharper shifts in land cover classification. Nevertheless, these fluctuations did not systematically translate into divergent trend recommendations.

In summary, the analysis confirms that the dataset is suitable for deriving consistent urbanisation trends across multi-decadal periods. Segmentation remains a valuable diagnostic tool for identifying local deviations, but for most spatial planning and land management applications, the total trend offers a statistically supported and methodologically stable representation of urban growth dynamics.

5. Discussion#

The results of this analysis demonstrate the suitability of the dataset for assessing long-term urbanisation trends. Despite the presence of breakpoints the overall consistency of the dataset supports its use for reliable trend estimation in urban land cover.

For spatial planning and land management applications, this temporal continuity is critical. The ability to trace urban expansion over a multi-decadal period enables stakeholders to evaluate the spatial footprint of urban growth, assess policy impacts, and inform future development strategies. The dataset captures the general trajectory of urbanisation effectively, even in the presence of short-term fluctuations or classification refinements.

Nonetheless, the analysis highlights the importance of methodological awareness. Some of the breakpoints occurre around years with known changes in the processing chain. These may introduce artificial discontinuities in urban extent, particularly in regions undergoing rapid development or where classification thresholds are more sensitive to spectral or spatial resolution. Such effects must be considered when interpreting apparent changes in urban dynamics.

Geographic and demographic context also influence data interpretation. Larger metropolitan regions, such as Madrid or Lisbon, tend to exhibit smoother trends, as broader patterns of growth absorb minor classification inconsistencies. In contrast, smaller or fast-developing regions may show more pronounced breaks, either due to real shifts in urbanisation or amplified sensitivity to methodological updates.

From a practical standpoint, the dataset supports robust urban trend detection, but effective application requires a cautious and informed approach. Users should:

  • Consider the potential influence of sensor transitions and processing updates on detected breakpoints

  • Prioritise total trends for general assessments, while using segmentation to highlight context-specific deviations

  • Interpret statistical outputs alongside relevant contextual knowledge, such as infrastructure investments, economic cycles, or policy milestones.

In conclusion, the dataset provides a strong foundation for monitoring urbanisation trends. Its temporal stability and spatial resolution are appropriate for regional-scale planning.

ℹ️ If you want to know more#

Key Resources#

References#

Vargo, J., Habeeb, D. M., & Stone, B. (2013). The importance of land cover change across urban-rural typologies for climate modeling. Journal of Environmental Management, 114, 243-252. https://doi.org/10.1016/j.jenvman.2012.10.007

Chang, Y., Hou, K., Li, X., Zhang, Y., & Chen, P. (2018). Review of land use and land cover change research progress. IOP Conference Series: Earth and Environmental Science, 113, 012087. https://doi.org/10.1088/1755-1315/113/1/012087

Chelali, A., Benediktsson, J. A., Chanussot, J., & Akbari, V. (2019). A review of artificial intelligence techniques for land cover change detection using remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 57(10), 7649–7668. https://doi.org/10.1109/JURSE.2019.8808967

Delgado, L., & Romero, J. (2021). Evaluating the impact of Madrid Central on urban mobility: A step toward sustainable cities? Cities, 113, 103120. https://doi.org/10.1016/j.cities.2021.103120

González, M. J., & Leal, J. (2018). Urban growth and real estate development in the Iberian Peninsula: The boom of 1996–2007. Land Use Policy, 76, 648–657. https://doi.org/10.1016/j.landusepol.2018.05.023

Gutiérrez, F., Rodríguez, V., & Prieto, C. (2016). Aging population and urbanization: Implications for Spain’s demographic landscape. Journal of Rural Studies, 45, 85–98. https://doi.org/10.1016/j.jrurstud.2016.03.008

López, F. A., Martín, J. C., & Albalate, D. (2017). High-speed rail and urban expansion in the Iberian Peninsula: The cases of AVE and Alfa Pendular. Journal of Transport Geography, 60, 57–68. https://doi.org/10.1016/j.jtrangeo.2017.02.005

Martínez, P., & García, R. (2015). The impact of the 2008 financial crisis on urban expansion in southern Spain. Habitat International, 47, 42–50. https://doi.org/10.1016/j.habitatint.2015.01.010

Palmero-Iniesta, H., Fernández, J. A., & Olcina, J. (2021). Land-use changes and natural land recovery after the 2008 crisis: Evidence from Spain. Environmental Science & Policy, 116, 72–83. https://api.semanticscholar.org/CorpusID:238829360

Rullan, O. (2014). The impact of tourism-driven urbanisation in Spain and Portugal: Coastal transformations and land use. Land Use Policy, 38, 139–150. https://doi.org/10.1016/j.landusepol.2014.02.009

Sánchez, M., Pérez, L., & Torres, D. (2019). Rural depopulation and urban migration trends in Spain and Portugal. Geoforum, 104, 78–91. https://doi.org/10.1016/j.geoforum.2019.06.015

Silva, E., & Fernandes, J. (2019). The influence of urban planning policies on land prices and informal settlements in Iberian cities. Land Use Policy, 89, 104200. https://doi.org/10.1016/j.landusepol.2019.04.012

Silva, V., Santos, R., & Moreira, J. (2017). The expansion of peri-urban areas in Lisbon and Porto: Drivers and consequences. Cities, 66, 101–112. https://doi.org/10.1016/j.cities.2017.04.002