Satellite Land Cover trend assessment for Spatial Planning and Land Management

1.4.4. Satellite Land Cover trend assessment for Spatial Planning and Land Management#

Production date: 01-07-2024

Produced by: Inês Girão (+ATLANTIC)

🌍 Use case: Using land cover products to quantify urbanisation trends#

❓ Quality assessment question#

Is the dataset accurate and consistent for the analysis of urbanisation trends in the Iberian Peninsula?

Land Cover data is an invaluable resource for a wide range of fields, from climate change research to urban planning. Land Cover products that provide historical timelines enable scientists, policymakers, and planners to understand and analyse the transformation of land cover over recent decades (Vargo et al, 2013; Chang et al., 2018).

This notebook will access the Land cover classification gridded maps from 1992 to present derived from satellite observations (henceforth, LC) data from the Climate Data Store (CDS) of the Copernicus Climate Change Service (C3S), and analyse the spatial patterns of the LC over a given Area of Interest (AoI) and time.

📢 Quality assessment statement#

These are the key outcomes of this assessment

The dataset maintains strong temporal continuity, with annual updates ensuring a smooth and reliable representation of land cover changes over time. While breakpoints were identified, they generally did not indicate major disruptions, reinforcing the dataset’s stability for long-term trend analysis.

The presence of breakpoints does not necessarily indicate abrupt landscape shifts but rather highlights the sensitivity of detection methods to gradual changes. This suggests that while breakpoints can help refine analysis, their impact on overall trends remains limited, emphasising the dataset’s resilience to minor variations.

For the specific land cover type analysed, the dataset exhibits a consistent ability to capture underlying trends. The similarity in results across segmented and total trends suggests that the data structure is well-calibrated, minimising distortions that could arise from classification inconsistencies or methodological biases.

📋 Methodology#

This Use Case comprises the following steps:

1. Define the AoI, search and download LC data

2. Inspect and view data for the defined AoI (Iberian Peninsula)

3. Breakpoint Detection

4.Trend Assessment

5. Discussion

📈 Analysis and results#

1. Define the AoI, search and download LC data#

Before we begin we must prepare our environment. This includes installing the Application Programming Interface (API) of the CDS, and importing the various python libraries that we will need.

Install CDS API#

To install the CDS API, run the following command. We use an exclamation mark to pass the command to the shell (not to the Python interpreter). If you already have the CDS API installed, you can skip or comment this step.

Import all the libraries/packages#

We will be working with data in NetCDF format. To best handle this type of data we will use libraries for working with multidimensional arrays, in particular Xarray. We will also need libraries for plotting and viewing data.

Data Overview#

To search for data, visit the CDS website: http://cds.climate.copernicus.eu Here you can search for ‘Satellite observations’ using the search bar. The data we need for this tutorial is the Land cover classification gridded maps from 1992 to present derived from satellite observations. This catalogue entry provides global Land Cover Classification (LCC) maps with a very high spatial resolution, with a L4 processing level, on an annual basis with a one-year delay, following the Global Climate Observing System (GCOS) convention requirements. LULC maps correspond to a global classification scheme, encompassing 22 classes.

The dataset consists of 2 versions (v2.0.7 produced by the European Space Agency (ESA) Climate Change Initiative (CCI) and v2.1.1 produced by Copernicus Climate Change Service (C3S)).

Data specifications for this use case:

Years: 1992 to 2022
Version: v2.0.7 before 1992 and v2.1.1 after 2016
Format: Zip files

At the end of the download form, select “Show API request”. This will reveal a block of code, which you can simply copy and paste into a cell of your Jupyter Notebook. Having copied the API request, running it will retrieve and download the data you requested into your local directory. However, before you run it, the terms and conditions of this particular dataset need to have been accepted directly at the CDS website. The option to view and accept these conditions is given at the end of the download form, just above the “Show API request” option. In addition, it is also useful to define the time period and AoI parameters and edit the request accordingly, as exemplified in the cells below.

<xarray.Dataset> Size: 7GB
Dimensions:              (year: 31, latitude: 3600, longitude: 5040, bounds: 2)
Coordinates:
  * year                 (year) int64 248B 1992 1993 1994 ... 2020 2021 2022
  * latitude             (latitude) float64 29kB 45.0 45.0 44.99 ... 35.0 35.0
  * longitude            (longitude) float64 40kB -9.999 -9.996 ... 3.996 3.999
    lat_bounds           (latitude, bounds, longitude) float64 290MB dask.array<chunksize=(1200, 1, 1680), meta=np.ndarray>
    lon_bounds           (longitude, bounds) float64 81kB dask.array<chunksize=(5040, 2), meta=np.ndarray>
    time_bounds          (year, bounds, longitude) datetime64[ns] 2MB dask.array<chunksize=(1, 2, 5040), meta=np.ndarray>
Dimensions without coordinates: bounds
Data variables:
    lccs_class           (year, latitude, longitude) uint8 562MB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    processed_flag       (year, latitude, longitude) float32 2GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    current_pixel_state  (year, latitude, longitude) float32 2GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    observation_count    (year, latitude, longitude) uint16 1GB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    change_count         (year, latitude, longitude) uint8 562MB dask.array<chunksize=(1, 1800, 2520), meta=np.ndarray>
    crs                  (year, longitude) int32 625kB dask.array<chunksize=(1, 5040), meta=np.ndarray>
Attributes: (12/38)
    id:                         ESACCI-LC-L4-LCCS-Map-300m-P1Y-1992-v2.0.7cds
    title:                      Land Cover Map of ESA CCI brokered by CDS
    summary:                    This dataset characterizes the land cover of ...
    type:                       ESACCI-LC-L4-LCCS-Map-300m-P1Y
    project:                    Climate Change Initiative - European Space Ag...
    references:                 http://www.esa-landcover-cci.org/
    ...                         ...
    geospatial_lon_max:         180
    spatial_resolution:         300m
    geospatial_lat_units:       degrees_north
    geospatial_lat_resolution:  0.002778
    geospatial_lon_units:       degrees_east
    geospatial_lon_resolution:  0.002778

2. Inspect and view data for the defined AoI (Iberian Peninsula)#

Compute urban area for each NUTS 2 region#

To identify changes in LC patterns, in this user question, NUTS 2 will be used, providing the information regarding the main regions of the Iberian Peninsula.

The NUTS are a hierarchical system divided into 3 levels. NUTS 1 correspond to major socio-economic regions, NUTS 2 correspond to basic regions for the application of regional policies, and NUTS 3 correspond to small regions for specific diagnoses. Additionally a NUTS 0 level, usually co-incident with national boundaries is also available. The NUTS legislation is periodically amended; therefore multiple years are available for download.

The step below masks the Land Cover data according to the NUTS 2 boundaries and calculate the area of each pixel (weighted by the cosine of Latitude). For each NUTS 2, we proceed with the analysis and visual inspection of Land Cover areas per class and corresponding percentages during the elected period.

Mask regions#

First, we need to establish the geometry of the NUTS region (level 2) in order to make the corresponding statistics.

Show code cell source

Hide code cell source

# Configure Dask
dask.config.set(scheduler='threads', **{'array.slicing.split_large_chunks': True})

# Define CRS and bounding box for Iberian Peninsula
crs_wgs84 = "EPSG:4326"
iberian_bbox = box(-10, 36, 3.5, 44)

# Load and filter GeoDataFrame
shapefile_url = "https://gisco-services.ec.europa.eu/distribution/v2/nuts/shp/NUTS_RG_20M_2021_4326.shp.zip"
gdf = gpd.read_file(shapefile_url)
gdf = gdf[gdf["LEVL_CODE"] == 2]
gdf = gdf[gdf["NUTS_ID"].str.startswith(("ES", "PT"))]  # Spain (ES) and Portugal (PT)
gdf = gdf.to_crs(crs_wgs84)
gdf = gdf[gdf.intersects(iberian_bbox)]
gdf = gdf[gdf.geometry.is_valid & ~gdf.geometry.is_empty]

if gdf.empty:
    raise ValueError("No valid geometries found after filtering.")

# Ensure dataset CRS is set
if ds.rio.crs is None:
    ds = ds.rio.write_crs("EPSG:4326", inplace=True)

# Ensure dataset coordinates overlap with the filtered regions
filtered_lon_min, filtered_lat_min, filtered_lon_max, filtered_lat_max = gdf.total_bounds
if (
    ds["longitude"].min().values > filtered_lon_max
    or ds["longitude"].max().values < filtered_lon_min
    or ds["latitude"].min().values > filtered_lat_max
    or ds["latitude"].max().values < filtered_lat_min
):
    raise ValueError(
        "Dataset coordinates do not overlap with the filtered regions. Check dataset and region alignment."
    )

# Subset dataset to valid ranges
ds = ds.sel(
    longitude=slice(
        max(filtered_lon_min, ds["longitude"].min().values),
        min(filtered_lon_max, ds["longitude"].max().values)
    ),
    latitude=slice(
        min(filtered_lat_max, ds["latitude"].max().values),
        max(filtered_lat_min, ds["latitude"].min().values)
    )
)

# Check subsetted dataset dimensions
if ds["longitude"].size == 0 or ds["latitude"].size == 0:
    raise ValueError("Subsetting resulted in an empty dataset. Check the bounds and ensure overlap.")

# Create the regionmask
regions = regionmask.from_geopandas(gdf, names="NUTS_NAME")

# Create a 2D mask
mask = regions.mask(ds["longitude"], ds["latitude"])

ds= ds.where(mask.notnull(), drop=True)

Compute cell area#

Then, we can calculate the area of each pixel taking into consideration the curvature of the earth (i.e., weighted by the cosine of Latitude).

Select Urban Classes and Prepare Dataset#

Compute urban area per region and year#

Map urban percentage coverage over-time by NUTS regions in the AoI#

Show code cell source

Hide code cell source

# Reproject the GeoDataFrame to a projected CRS (EPSG:3035 is a good choice for Europe)
results_df = results_df.to_crs("EPSG:3035")

# Calculate the total area of each region (in square kilometres)
results_df["Region_Total_Area"] = results_df.geometry.area / 10**6  # Convert from m² to km²

# Ensure 'Year' is converted to string if needed
results_df['Year'] = results_df['Year'].astype(str)

#Calculate Urban Percentage
results_df['Urban_Percentage'] = (results_df['Urban_Area'] / results_df['Region_Total_Area']) * 100

# Pivot the data to have years as columns
Urban_pivot = results_df.pivot(index='Region', columns='Year', values='Urban_Percentage').reset_index()

# Extract years for plotting
years = [col for col in Urban_pivot.columns if col.isdigit()]

# Normalize color scale across all maps based on percentage coverage
norm_percentage = Normalize(vmin=0, vmax=results_df['Urban_Percentage'].max())

# Plot settings
n_cols = 3
n_rows = int(np.ceil(len(years) / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 4), squeeze=False)

# Plot each year in a separate subplot
for idx, year in enumerate(years):
    row, col = divmod(idx, n_cols)
    ax = axes[row, col]
    results_df_year = results_df[results_df['Year'] == year]
    results_df_year.plot(column='Urban_Percentage', cmap="Oranges", ax=ax, edgecolor="k", norm=norm_percentage)
    ax.set_title(f"Urban Coverage (%) in {year}", fontsize=10)
    ax.axis('off')

# Hide unused subplots
for idx in range(len(years), n_rows * n_cols):
    row, col = divmod(idx, n_cols)
    axes[row, col].axis('off')

# Step 8: Adjust layout
plt.tight_layout()

# Allocate space for the color bar
fig.subplots_adjust(right=0.85)
cbar_ax = fig.add_axes([0.87, 0.15, 0.03, 0.7])  # Create a new axes for color bar

# Create a single ScalarMappable object for the common colorbar
sm_common = plt.cm.ScalarMappable(cmap="Oranges", norm=norm_percentage)
sm_common._A = []  # Dummy array for the scalar mappable
fig.colorbar(sm_common, cax=cbar_ax, label="Urban Coverage (%)")

plt.show()

../../_images/fedca45efe5c5ee436699f2d839096fe2476fb07ac828aa989bc198610363ed6.png

Map Analysis#

Over the 28-year period, all regions maintain a consistent pattern of growth.
Regions such as Comunidad Madrid and Área Metropolitana de Lisboa consistently exhibit the highest levels of urban percentage coverage.
No negative fluctuations were observed
The biggest changes are observed between 2016-2017.

Potential Drivers and Methodological Considerations in Urbanisation Trends in the Iberian Peninsula (1992–2022)#

Urbanisation in the Iberian Peninsula over the last three decades has been influenced by a combination of economic shifts, demographic changes, and policy decisions. Before analysing trends, it is essential to account for abrupt shifts that may distort trend estimates. These shifts, often referred to as breakpoints, can result from:

Economic Drivers

Economic Growth and Real Estate Booms: The rapid expansion of urban areas in Madrid, Barcelona, and Lisbon was driven by economic growth and increased investment in housing and infrastructure, particularly during the 1996–2007 real estate boom (González & Leal, 2018). The expansion of suburban areas led to the rise of peri-urban developments in regions like Valencia and Porto (Silva et al., 2017).
Economic Crisis (2008–2013): The 2008 financial crisis led to a slowdown in urban expansion, particularly in southern Spain, where unfinished developments and ghost towns became a common sight (Martínez & García, 2015). However, in some areas, reduced construction pressure allowed for the recovery of natural land cover (Palmero-Iniesta et al., 2021).

Demographic and Social Changes

Rural Depopulation: Migration from rural areas to urban centers, particularly among younger populations, has led to declining populations in Extremadura, Castilla y León, and Alentejo, accelerating the abandonment of agricultural lands and increasing the expansion of urban areas (Sánchez et al., 2019).
Tourism and Second-Home Development: The rise of tourism-driven urbanisation has significantly reshaped coastal regions like the Costa del Sol, Algarve, and the Balearic Islands, where seasonal housing developments have expanded rapidly (Rullan, 2014).
Aging Population: Aging populations in rural regions have led to increased urban migration, further depopulating countryside areas and concentrating populations in cities like Seville, Valencia, and Bilbao (Gutiérrez et al., 2016).

Infrastructure and Policy Influences

EU-Funded Infrastructure Projects: Investments in transport networks, such as the expansion of high-speed rail (AVE) in Spain and Alfa Pendular in Portugal, have facilitated suburbanisation and commuting from secondary urban centers (López et al., 2017).
Urban Planning Policies: The implementation of urban growth controls and greenbelt policies in cities like Barcelona and Porto has influenced spatial expansion patterns, sometimes leading to increased land prices and informal settlements (Silva & Fernandes, 2019).
Smart Cities and Sustainability Initiatives: Recent efforts to create more sustainable urban environments, such as Madrid’s Madrid Central low-emission zone, have shaped urbanisation patterns, promoting compact city models (Delgado & Romero, 2021).

Methodology

Certain shifts in urban land cover may be related to data collection and processing rather than actual urban expansion. For instance, transitions between different satellite sensors can introduce artificial breaks in the time series (Chelali et al., 2019).
The LC dataset is generated from a multi-sensor surface reflectance time series derived from several satellite missions. As sensor technology evolved, different instruments were used to produce consistent global composites.

Surface Reflectance (SR) Input	Reference Period	Satellite Sensor / Mission	Main Characteristics
AVHRR SR composites	1992–1999	AVHRR-2 (NOAA-11, NOAA-14)	~1 km spatial resolution, visible and near-infrared observations
SPOT-VGT SR composites	1999–2013	SPOT-4 / SPOT-5 VEGETATION	1 km spatial resolution, 4 spectral bands (blue, red, NIR, SWIR)
MERIS SR composites	2003–2012	Envisat MERIS	~300 m spatial resolution, 15 spectral bands in visible and near-infrared
PROBA-V SR composites	2013–2019	PROBA-V	~300 m spatial resolution, 4 spectral bands (blue, red, NIR, SWIR)
Sentinel-3 SR composites	2020	Sentinel-3 OLCI	~300 m spatial resolution, multispectral observations
Sentinel-3 SR composites	2021–2022	Sentinel-3 OLCI + SLSTR	Optical and thermal observations supporting land-cover mapping

Source: See Figure 1-3 in C3S, LC - Product User Guide and Specification (2024)

Due to the limited spatial resolution of early sensors (AVHRR and SPOT-VGT), reliable detection of annual urban change was challenging prior to 2003. During this period, the representation of urban areas is guided by reference urban footprints derived from the Global Human Settlement Layer (GHSL) datasets for 1990 and 2000. These datasets provide baseline constraints for the urban class within the 1992–1999 and 2000–2003 epochs, helping to stabilise the detection of urban areas while still allowing minor variations arising from classification adjustments and mixed-pixel effects. From 2003 onwards, urban areas are derived from higher-resolution observations. To ensure temporal consistency, urban areas are only allowed to expand over time, and the resulting footprint is constrained between a minimum extent defined by GHSL 2000 and a maximum extent defined by GHSL 2014 (C3S - LC Algorithm Theoretical Basis Document, 2024).
In addition, the Global Urban Footprint (GUF) dataset is combined with GHSL 2014 to reduce potential omissions of built-up areas (C3S - LC Algorithm Theoretical Basis Document, 2024).
In addition to optical surface reflectance observations, auxiliary datasets are used to improve land-cover mapping. In particular, Envisat ASAR Wide Swath Mode (WSM) radar observations (2005–2012) are incorporated as ancillary information (C3S - LC Target Requirements and Gap Analysis Document, 2024).
While the dataset relies on multiple satellite sensors, it is not generated through simple sensor-to-sensor transitions. Instead, the global land-cover time series is built around a baseline land-cover map derived from MERIS observations (2003–2012). Earlier periods (1992–2003) are reconstructed through back-dating using AVHRR and SPOT-VGT data, while later years are produced through incremental updates using SPOT-VGT, PROBA-V, and Sentinel-3 observations. During several periods, multiple sensors are used simultaneously, with different roles (e.g., temporal consistency versus spatial refinement). As a result, potential discontinuities in the time series are more likely to arise from changes in processing strategy, data fusion, and constraints, rather than from simple sensor replacements.

Global LC database	Reference period	Satellite data source
Baseline 10-year global LC map	2003–2012	• MERIS FR/RR global SR composites between 2003 and 2012
Global annual LC maps	1992–1999	• Baseline 10-year global LC map • AVHRR global SR composites between 1992 and 1999 for back-dating the baseline
	1999–2013	• Baseline 10-year global LC map • SPOT-VGT global SR composites between 1999 and 2013 for up- and back-dating the baseline • MERIS FR global SR composites between 2003 and 2012 to delineate the identified changes at 300 m spatial resolution • PROBA-V global SR composites at 300 m for year 2013 to delineate the identified changes at 300 m spatial resolution
	2014–2019	• PROBA-V global SR composites at 1 km for years 2014 to 2019 for updating the baseline • PROBA-V time series at 300 m for 2014 to 2019 to delineate the identified changes at the LC map spatial resolution
	2020–2022	• S3 global SR composites at 1 km for years 2020 to 2022 for updating the baseline • S3 time series at 300 m for 2020 to 2022 to delineate the identified changes at the LC map spatial resolution

Source: See Table 1-3 in C3S, LC - Product User Guide and Specification (2024)

3. Breakpoint Detection#

A first step in assessing temporal behaviour is to analyse the rate of change of the urban-area time series. This provides an intuitive view of how the pace of urban expansion evolves over time and helps highlight periods where the trajectory deviates from its typical pattern.

However, while such exploratory diagnostics identify points of interest, they do not by themselves confirm whether the observed changes are structurally meaningful. Year-to-year variations may reflect noise, smoothing effects, or gradual adjustments in the underlying product. To formally assess structural changes, a statistical breakpoint detection method is applied.

The analysis therefore combines rate-of-change diagnostics with statistical segmentation, with particular attention to the monotonic nature of the data and the need to avoid over-interpreting gradual trends as discrete breakpoints.

Rate of Change Calculation

The first derivative of the urban-area time series is computed to quantify the year-to-year change in urban extent.
This highlights periods where the pace of urban expansion deviates from its typical behaviour, providing an initial indication of potential anomalies or shifts in the series.

Statistical Thresholding of Derivative Spikes

To identify unusually strong deviations, a dynamic threshold is defined as:

Threshold = mean(|rate of change|) + 1.5 × standard deviation
Years where the absolute rate of change exceeds this threshold are flagged as derivative spikes.
These spikes represent periods where the magnitude of change is unusually large relative to the typical variability of the series.
This step serves as a diagnostic indicator of anomalous growth behaviour, but does not in itself define structural breakpoints.

Breakpoint Detection Using the PELT Algorithm

The Pruned Exact Linear Time (PELT) algorithm is used to detect changes in the statistical behaviour of the growth rate, rather than in the cumulative urban-area series itself.
This is important because the urban-area time series is monotonic and non-decreasing, meaning that applying segmentation directly to the cumulative values can lead to artificial breakpoints driven by gradual growth.
The algorithm is therefore applied to the first derivative (growth rate), allowing detection of:
- transitions between periods with different rates of urban expansion, rather than absolute levels.
The implementation uses:
- the L2 cost function (model="l2") to minimise variance within segments,
- a fixed penalty parameter (pen = 2.5) applied to a standardised growth-rate series,
- and a minimum segment length (min_size = 3) to avoid short, unstable segments.
Prior to segmentation, the growth-rate series may be standardised to ensure comparability across regions with different magnitudes of urban change.

Breakpoint Selection

Breakpoints are identified directly from the segmentation of the growth-rate series.
Unlike earlier approaches, no explicit filtering based on derivative spikes is applied, as this was found to introduce additional assumptions and potentially remove meaningful structural changes.
Instead, derivative spikes and breakpoints are analysed in parallel, allowing independent but complementary perspectives on the temporal behaviour of the series.

Methodological Considerations

The urban-area time series is strictly monotonic and cumulative by construction, reflecting constraints in the land-cover product that prevent decreases in urban extent.
As a result:
- segmentation applied to the raw series tends to produce regularly spaced breakpoints, even in the absence of true structural changes.
- applying breakpoint detection to the growth rate mitigates this issue by focusing on changes in dynamics rather than accumulated values.
Even when applied to growth rates, breakpoint detection should be interpreted cautiously:
- some detected changes may reflect gradual adjustments in processing or classification, rather than abrupt real-world events.
- others may correspond to genuine shifts in the pace of urban expansion.
For this reason, breakpoint detection is treated as a diagnostic tool, not as definitive evidence of discrete events.

Interpreting Breakpoint Timing

Breakpoint positions should be interpreted as approximate indicators of when a change in growth dynamics begins, rather than exact points of maximum change.

Because segmentation identifies changes in statistical behaviour, the detected breakpoint may:

precede the most visible increase in growth rate, or
occur within a broader transition period.

As a result, breakpoint timing should be considered indicative rather than exact, and interpreted in conjunction with derivative-based diagnostics.

Show code cell source

Hide code cell source

# Calculate breakpoints and derivative spikes for each region

processing_phases = [
    {"label": "AVHRR", "start": 1992, "end": 1999},
    {"label": "Baseline MERIS", "start": 2003, "end": 2012},
    {"label": "SPOT-VGT", "start": 1999, "end": 2013},
    {"label": "PROBA-V", "start": 2013, "end": 2019},
    {"label": "Sentinel-3", "start": 2020, "end": 2022},
]

phase_colors = {
    "AVHRR": "#e6f2ff",
    "Baseline MERIS": "#a07197",
    "SPOT-VGT": "#b3d4b3",
    "PROBA-V": "#fff5e6",
    "Sentinel-3": "#f2e6ff",
}

ghsl_epochs = [
    {"label": "GHSL 1990 epoch", "start": 1992, "end": 1999},
    {"label": "GHSL 2000 transition", "start": 1999, "end": 2003},
    {"label": "GHSL 2000+ epoch", "start": 2003, "end": 2022},
]

# GHSL change years shown as hard-line transitions
ghsl_change_years = [1999, 2003]

# Midpoint used for before/after GHSL switch classification
ghsl_switch_year = 1999.5

# Number of subplot columns in the regional figure layout
n_cols = 3

# Threshold multiplier used to flag unusually large derivative spikes:
# spike if |growth rate| > mean(|growth rate|) + spike_std_factor * std(growth rate)
spike_std_factor = 1.5

# Minimum number of derivative observations allowed in each PELT segment.
pelt_min_size = 1

# Penalty term for PELT breakpoint detection.
pelt_penalty = 2.5

# Standardize the growth-rate series before applying PELT so that breakpoint
# detection is comparable across regions with different magnitudes of urban change.
standardize_growth = True


# Functions
def assign_label(x, windows, default="Unknown"):
    for w in windows:
        if w["start"] <= x < w["end"] or np.isclose(x, w["end"]):
            return w["label"]
    return default

def interval_label(y0, y1):
    return f"{int(y0)}-{int(y1)}"

def monotonicity_metrics(series):
    diffs = np.diff(series)
    if len(diffs) == 0:
        return {
            "Monotonic_NonDecreasing": True,
            "Num_Decreases": 0,
            "Positive_Ratio": np.nan,
            "Negative_Ratio": np.nan,
            "Zero_Ratio": np.nan,
        }

    return {
        "Monotonic_NonDecreasing": bool(np.all(diffs >= 0)),
        "Num_Decreases": int(np.sum(diffs < 0)),
        "Positive_Ratio": np.mean(diffs > 0),
        "Negative_Ratio": np.mean(diffs < 0),
        "Zero_Ratio": np.mean(diffs == 0),
    }

def compute_growth(years, values):
    if len(years) <= 1:
        return np.array([]), np.array([])
    growth = np.diff(values) / np.diff(years)
    growth_years = (years[:-1] + years[1:]) / 2
    return growth, growth_years

def detect_spikes(growth, spike_std_factor=1.5):
    if len(growth) == 0:
        return np.array([], dtype=int)
    threshold = np.mean(np.abs(growth)) + spike_std_factor * np.std(growth)
    return np.where(np.abs(growth) > threshold)[0]

def prepare_for_pelt(growth, standardize=True):
    if len(growth) == 0:
        return growth
    if standardize and np.std(growth) > 0:
        return (growth - np.mean(growth)) / np.std(growth)
    return growth.copy()

def detect_breakpoints(growth, min_size=3, penalty=2.5, standardize=True):
    if len(growth) < min_size + 1:
        return np.array([], dtype=int)

    growth_for_pelt = prepare_for_pelt(growth, standardize=standardize)
    algo = rpt.Pelt(model="l2", min_size=min_size).fit(growth_for_pelt.reshape(-1, 1))
    detected = np.array(algo.predict(pen=penalty)) - 1

    # remove invalid and terminal interval
    return np.array([bp for bp in detected if 0 <= bp < len(growth) - 1], dtype=int)

def make_event_record(region, years, positions, idx, extra=None):
    pos = positions[idx]
    record = {
        "Region": region,
        "Year": float(pos),
        "Interval": interval_label(years[idx], years[idx + 1]),
        "Position": int(idx),
        "Processing_Phase": assign_label(pos, processing_phases),
        "GHSL_Epoch": assign_label(pos, ghsl_epochs),
        "GHSL_Switch_Side": (
            "Before GHSL 2000 switch" if pos < ghsl_switch_year else "After GHSL 2000 switch"
        ),
    }
    if extra:
        record.update(extra)
    return record

def update_counts(record, interval_counts, phase_counts, ghsl_counts, prepost_counts):
    interval_counts[record["Interval"]] += 1
    phase_counts[record["Processing_Phase"]] += 1
    ghsl_counts[record["GHSL_Epoch"]] += 1
    prepost_counts[record["GHSL_Switch_Side"]] += 1

# ==================================================
# DATA PREP
# ==================================================

unique_regions = results_df["Region"].unique()
n_regions = len(unique_regions)

global_y_min = results_df["Urban_Area"].min()
global_y_max = results_df["Urban_Area"].max()

global_derivative_min = float("inf")
global_derivative_max = float("-inf")

for region in unique_regions:
    region_data = results_df[results_df["Region"] == region].sort_values("Year")
    years = region_data["Year"].to_numpy(dtype=float)
    urban_area = region_data["Urban_Area"].to_numpy(dtype=float)

    growth, _ = compute_growth(years, urban_area)
    if len(growth) > 0:
        global_derivative_min = min(global_derivative_min, np.min(growth))
        global_derivative_max = max(global_derivative_max, np.max(growth))

if not np.isfinite(global_derivative_min) or not np.isfinite(global_derivative_max):
    global_derivative_min, global_derivative_max = -1, 1

# ==================================================
# STORAGE
# ==================================================

breakpoint_records = []
spike_records = []
monotonicity_records = []

interval_breakpoint_counts = defaultdict(int)
interval_spike_counts = defaultdict(int)

phase_breakpoint_counts = defaultdict(int)
phase_spike_counts = defaultdict(int)

ghsl_breakpoint_counts = defaultdict(int)
ghsl_spike_counts = defaultdict(int)

prepost_breakpoint_counts = defaultdict(int)
prepost_spike_counts = defaultdict(int)

Show code cell source

Hide code cell source

# Plot Breakpoints Series
n_rows = int(np.ceil(n_regions / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 8, n_rows * 5), squeeze=False)
axes = axes.flatten()

for idx, region in enumerate(unique_regions):
    ax = axes[idx]

    region_data = results_df[results_df["Region"] == region].sort_values("Year")
    years = region_data["Year"].to_numpy(dtype=float)
    urban_area = region_data["Urban_Area"].to_numpy(dtype=float)

    mono = monotonicity_metrics(urban_area)
    monotonicity_records.append({"Region": region, **mono})

    growth, growth_years = compute_growth(years, urban_area)
    spike_idx = detect_spikes(growth, spike_std_factor=spike_std_factor)
    breakpoint_idx = detect_breakpoints(
        growth,
        min_size=pelt_min_size,
        penalty=pelt_penalty,
        standardize=standardize_growth
    )

    spike_positions = growth_years[spike_idx] if len(spike_idx) else np.array([])
    breakpoint_positions = growth_years[breakpoint_idx] if len(breakpoint_idx) else np.array([])

    # Store spikes
    for s in spike_idx:
        rec = make_event_record(
            region, years, growth_years, s,
            extra={"Derivative": growth[s]}
        )
        spike_records.append(rec)
        update_counts(
            rec,
            interval_spike_counts,
            phase_spike_counts,
            ghsl_spike_counts,
            prepost_spike_counts
        )

    # Store breakpoints
    for bp in breakpoint_idx:
        rec = make_event_record(region, years, growth_years, bp)
        breakpoint_records.append(rec)
        update_counts(
            rec,
            interval_breakpoint_counts,
            phase_breakpoint_counts,
            ghsl_breakpoint_counts,
            prepost_breakpoint_counts
        )

    # Background phases
    for phase in processing_phases:
        ax.axvspan(
            phase["start"],
            phase["end"],
            color=phase_colors.get(phase["label"], "lightgrey"),
            alpha=0.35,
            zorder=0
        )

    # Main series
    ax.plot(years, urban_area, marker="o", color="grey", zorder=2)
    ax.set_ylabel("Urban Area (km²)", color="grey")
    ax.tick_params(axis="y", colors="grey")
    ax.set_ylim(global_y_min, global_y_max)

    # GHSL change years as hard lines
    for ghsl_year in ghsl_change_years:
        ax.axvline(
            ghsl_year,
            color="dodgerblue",
            linestyle="-",
            linewidth=2.2,
            zorder=3
        )

    # Breakpoints
    for x in breakpoint_positions:
        ax.axvline(
            x,
            color="red",
            linestyle="-",
            linewidth=2.5,
            zorder=4
        )

    # Spikes
    for x in spike_positions:
        ax.axvline(
            x,
            color="gold",
            linestyle=":",
            linewidth=2.5,
            zorder=5
        )

    # Growth rate
    ax2 = ax.twinx()
    ax2.plot(
        growth_years,
        growth,
        marker="o",
        linestyle="dotted",
        color="black",
        zorder=6
    )
    ax2.set_ylabel("Rate of Change", color="black")
    ax2.tick_params(axis="y", colors="black")
    ax2.set_ylim(global_derivative_min, global_derivative_max)

    # Formatting
    ax.set_xticks(years)
    ax.set_xticklabels(years.astype(int), rotation=90)
    ax.set_title(region)
    ax.set_xlabel("Year")
    ax.grid()

# Remove empty axes
for idx in range(n_regions, len(axes)):
    fig.delaxes(axes[idx])


# Processing phases legend
phase_handles = [
    Patch(
        facecolor=phase_colors[p["label"]],
        edgecolor="black",
        alpha=0.5,
        label=p["label"]
    )
    for p in processing_phases
]

legend_phases = fig.legend(
    handles=phase_handles,
    loc="upper center",
    bbox_to_anchor=(0.5, 0.992),
    ncol=len(processing_phases),
    frameon=True,
    fontsize=9,
    title_fontsize=10
)

# Line/signal legend
line_handles = [
    Line2D([0], [0], color="grey", marker="o", linestyle="-", label="Urban Area"),
    Line2D([0], [0], color="black", marker="o", linestyle="dotted", label="Rate of Change"),
    Line2D([0], [0], color="dodgerblue", linestyle="-", linewidth=2.2, label="GHSL change year"),
    Line2D([0], [0], color="red", linestyle="-", linewidth=2.5, label="Breakpoint"),
    Line2D([0], [0], color="gold", linestyle=":", linewidth=2.5, label="Derivative spike"),
]

legend_lines = fig.legend(
    handles=line_handles,
    loc="upper center",
    bbox_to_anchor=(0.5, 0.98),
    ncol=len(line_handles),
    frameon=True,
    fontsize=9,
    title_fontsize=10
)

# Compact spacing between legends and plots
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

../../_images/cc1a90e10c3b0930d1d4e2e09b40544e8c5c2089f2d3ef19972ee2ff31b57ae1.png

To assess whether detected discontinuities in the urban-area time series may be influenced by methodological factors, a targeted alignment analysis is performed. The aim is to evaluate whether growth-rate breakpoints and derivative spikes occur shortly after known processing-chain changes.

For each detected event, the time difference relative to each processing-change year is computed. An event is considered potentially related to a processing change only if it occurs within the two years following that transition. This forward-looking window reflects the temporal characteristics of the dataset, where land-cover changes are only confirmed if they persist over two consecutive years.

In addition to temporal alignment, the analysis also considers the spatial consistency of detected signals, quantified as the number of regions affected within a given interval. Methodological effects are expected to influence the dataset in a systematic way, and therefore to appear consistently across multiple regions. In contrast, signals affecting only a small number of regions are more likely to reflect local dynamics or region-specific variations rather than processing-related artefacts.

The results are summarised in four complementary tables, distinguishing between:

Event type: breakpoints vs derivative spikes
Analytical scope: aligned vs full set of identified breakpoints and derivative spikes

The first two tables report only events that occur within two years after a processing transition, while the two additional tables report all detected events.

Show code cell source

Hide code cell source

# Plot the aligned breakpoints and derivative spikes with known methodological changes present in more than half of the regions

sensors_timeline = {
    "GHSL_switch": [2000, 2003],
    "AVHRR_finish": [1999],
    "SPOT-VGT_start": [1999],
    "SPOT-VGT_finish": [2013],
    "MERIS_baseline_start": [2003],
    "MERIS_baseline_finish": [2012],
    "PROBAV_start": [2013],
    "S3_OLCI_start": [2020],
    "S3_OLCI_SLSTR_start": [2021],
}

forward_window = 2.0
min_regions = 10  # threshold for strong/systematic signal

breakpoint_df = pd.DataFrame(breakpoint_records)
spike_df = pd.DataFrame(spike_records)

def transition_interval_label(year, forward_window=2.0):
    start = int(year)
    end = int(year + forward_window)
    return f"{start}-{end}"

def summarize_by_transition_table(df, forward_window=2.0, min_regions=10):
    if df.empty:
        return pd.DataFrame(), False

    results = []
    any_aligned = False

    for transition_name, years in sensors_timeline.items():
        for transition_year in years:

            tmp = df.copy()
            tmp["Years_After"] = tmp["Year"] - transition_year

            aligned = tmp[
                (tmp["Years_After"] > 0) &
                (tmp["Years_After"] <= forward_window)
            ].copy()

            if aligned.empty:
                continue

            any_aligned = True

            regions = sorted(aligned["Region"].unique())

            if len(regions) < min_regions:
                continue

            intervals = sorted(aligned["Interval"].unique())

            results.append({
                "Interval": transition_interval_label(transition_year, forward_window),
                "Aligned signal intervals": ", ".join(intervals),
                "Number of regions": len(regions),
                "Region names": ", ".join(regions),
                "Nearest processing changes": transition_name,
            })

    if not results:
        return pd.DataFrame(), any_aligned

    summary = pd.DataFrame(results)

    summary = summary.sort_values(
        ["Interval", "Number of regions"],
        ascending=[True, False]
    ).reset_index(drop=True)

    return summary, any_aligned

def display_html_table(df):
    html = df.to_html(index=False, escape=False)
    display(HTML(html))

# Build tables
bp_summary, bp_any_aligned = summarize_by_transition_table(
    breakpoint_df,
    forward_window=forward_window,
    min_regions=min_regions
)

spike_summary, spike_any_aligned = summarize_by_transition_table(
    spike_df,
    forward_window=forward_window,
    min_regions=min_regions
)

# Display
print("### Breakpoint alignment table ###")
if bp_summary.empty:
    if bp_any_aligned:
        print(
            f"No breakpoint signals affect at least {min_regions} regions "
            f"within the known methodological changes transition windows"
        )
    else:
        print("No breakpoint signals are temporally aligned with any known methodological changes")
else:
    display_html_table(bp_summary)

print("\n### Derivative spike alignment table ###")
if spike_summary.empty:
    if spike_any_aligned:
        print(
            f"No derivative spike signals affect at least {min_regions} regions "
            f"within the known methodological changes transition windows"
        )
    else:
        print("No derivative spike signals are temporally aligned with any known methodological changes")
else:
    display_html_table(spike_summary)

### Breakpoint alignment table ###

Interval	Aligned signal intervals	Number of regions	Region names	Nearest processing changes
2000-2002	2001-2002	12	Alentejo, Algarve, Aragón, Cantabria, Castilla y León, Castilla-La Mancha, Centro (PT), Comunidad Foral de Navarra, Extremadura, Galicia, Illes Balears, La Rioja	GHSL_switch

### Derivative spike alignment table ###
No derivative spike signals affect at least 10 regions within the known methodological changes transition windows

### Breakpoint Table ###

	intervals	number of regions	regions name	nearest processing changes
0	2016-2017	13	Cantabria, Castilla y León, Castilla-La Mancha, Cataluña, Centro (PT), Comunidad Foral de Navarra, Comunitat Valenciana, Galicia, Norte, País Vasco, Principado de Asturias, Región de Murcia, Área Metropolitana de Lisboa	None within 2 years
1	2001-2002	12	Alentejo, Algarve, Aragón, Cantabria, Castilla y León, Castilla-La Mancha, Centro (PT), Comunidad Foral de Navarra, Extremadura, Galicia, Illes Balears, La Rioja	GHSL_switch (2000)
2	2006-2007	6	Alentejo, Algarve, Andalucía, Cataluña, Illes Balears, País Vasco	None within 2 years
3	1996-1997	3	Cataluña, Comunitat Valenciana, País Vasco	None within 2 years
4	2011-2012	2	Castilla y León, Región de Murcia	None within 2 years

### Derivative Spike Table ###

	intervals	number of regions	regions name	nearest processing changes
0	2016-2017	8	Andalucía, Castilla y León, Castilla-La Mancha, Comunidad Foral de Navarra, Comunidad de Madrid, Comunitat Valenciana, La Rioja, Región de Murcia	None within 2 years
1	2013-2014	7	Andalucía, Castilla y León, Castilla-La Mancha, Centro (PT), Galicia, Principado de Asturias, Región de Murcia	SPOT-VGT_finish (2013); MERIS_baseline_finish (2012); PROBAV_start (2013)
2	2012-2013	5	Andalucía, Aragón, Centro (PT), Región de Murcia, Área Metropolitana de Lisboa	MERIS_baseline_finish (2012)
3	2001-2002	4	Algarve, Cataluña, Norte, Área Metropolitana de Lisboa	GHSL_switch (2000)
4	2002-2003	4	Alentejo, Algarve, Cataluña, Illes Balears	None within 2 years
5	2017-2018	4	Andalucía, Aragón, Extremadura, La Rioja	None within 2 years
6	2000-2001	2	Cataluña, Área Metropolitana de Lisboa	GHSL_switch (2000); AVHRR_finish (1999); SPOT-VGT_start (1999)
7	2003-2004	2	Algarve, Illes Balears	GHSL_switch (2003); MERIS_baseline_start (2003)
8	2004-2005	2	Norte, País Vasco	GHSL_switch (2003); MERIS_baseline_start (2003)
9	2006-2007	2	Comunidad Foral de Navarra, Extremadura	None within 2 years
10	2014-2015	2	Cantabria, Galicia	SPOT-VGT_finish (2013); PROBAV_start (2013)
11	2005-2006	1	País Vasco	None within 2 years

Breakpoint Analysis#

Breakpoint detection and derivative analysis capture two complementary aspects of temporal dynamics in land cover time series, a distinction widely recognised in change detection literature (Chelali et al., 2019; Chang et al., 2018):

Breakpoints identify the onset of structural changes in the time series
Derivative spikes highlight years of maximum year-to-year variation

Because of this, spikes may occur slightly after breakpoints, reflecting the difference between initiation and peak expression of change (Chelali et al., 2019).

The interpretation is guided by:

temporal alignment with processing-chain transitions, and
spatial consistency across regions

This is particularly important in satellite-derived land cover products, where methodological updates (e.g. sensor transitions, classification schemes, or reference layers) can introduce artificial discontinuities in time series.

A signal affecting many regions simultaneously is more likely to reflect a systematic driver (either methodological or large-scale real dynamics), whereas signals affecting few regions are more likely local effects (Chang et al., 2018).

Breakpoint ~2016–2017

This period exhibits the highest spatial consistency, with breakpoints detected across a large majority of regions. The signal is also supported by concurrent derivative spikes, indicating both structural change and increased short-term variability in the time series.

Importantly, this breakpoint cluster shows no temporal alignment with any identified processing-chain transition, ruling out a methodological origin.

Interpretation
This pattern provides strong evidence of a genuine structural shift in urban dynamics. The combination of high spatial coherence and lack of methodological alignment suggests a system-wide driver, consistent with the post-2008 recovery phase, during which urban growth resumed after a period of contraction (Martínez & García, 2015; Palmero-Iniesta et al., 2021).

More broadly, renewed urban expansion in the Iberian Peninsula during the 2010s has been linked to economic recovery, infrastructure investment, and renewed real estate activity (González & Leal, 2018; López et al., 2017). The breakpoint reflects a sustained change in growth trajectory, while the associated spikes indicate short-term adjustments during this transition.

Breakpoint ~2001–2002

A prominent breakpoint cluster is observed in this period, affecting a large number of regions and indicating a highly coherent spatial signal. This interval falls within the transition window associated with the GHSL reference update (1999–2000).

Interpretation
This signal is best understood as a composite effect, arising from the interaction of:

Methodological influences, particularly the constraint imposed by changes in reference datasets such as GSHL, which can introduce step-like adjustments in land cover products
Underlying urban expansion dynamics, corresponding to the early phase of the Iberian housing boom, characterised by rapid land consumption and development pressure (González & Leal, 2018)

The persistence of the breakpoint signal across regions suggests a structural component; however, its temporal proximity to a major processing transition prevents a clear separation between artefact and real change.

Breakpoint ~2006–2007

This breakpoint cluster affects a moderate number of regions and is not accompanied by strong or synchronised derivative spikes, indicating the absence of abrupt year-to-year changes.

No temporal association is found with processing-chain transitions.

Interpretation
This pattern reflects a progressive structural adjustment rather than a sudden shift. The breakpoint likely captures a change in the underlying growth regime, consistent with the late phase of the Iberian housing expansion cycle (González & Leal, 2018).

The absence of strong derivative spikes suggests that the transition occurred through gradual changes in growth rate, rather than discrete jumps, supporting its interpretation as a real, process-driven signal.

Minor breakpoint clusters

Additional breakpoint detections are observed, each affecting a limited number of regions and lacking both strong spatial coherence and consistent alignment with processing transitions.

Interpretation
These signals are interpreted as localised or context-specific dynamics, rather than manifestations of system-wide processes. Their limited spatial extent and lack of temporal consistency indicate that they do not reflect either:

large-scale methodological artefacts
coherent regional-scale urban transitions

Instead, they likely correspond to region-specific developments, such as:

tourism-driven coastal urbanisation (Rullan, 2014)
peri-urban expansion processes (Silva et al., 2017)
planning policies and land-market dynamics (Silva & Fernandes, 2019)
demographic shifts, including rural depopulation and urban migration (Sánchez et al., 2019; Gutiérrez et al., 2016)

4.Trend Assessment#

The next and final step is to determine whether the trend should be computed for the entire period (total trend) or divided into segments based on detected breakpoints.

The decision between using the total trend or segmented trends is based on Sen’s Slope (Trend Magnitude) and Mann-Kendall p-values (Trend Significance). Sen’s Slope is a non-parametric estimator that calculates the median rate of change over time, making it robust to outliers and suitable for detecting monotonic trends. First, Sen’s Slope is computed for the total trend, capturing the overall rate of urban change. Then, Sen’s Slope is calculated for each segmented trend to assess whether breakpoints introduce significant shifts in trend magnitude. The Mann-Kendall test is a non-parametric statistical test used to assess the presence of a monotonic upward or downward trend in a time series without requiring the data to follow any particular distribution. In parallel, the Mann-Kendall p-value is evaluated for both the total and segmented trends to measure trend significance - lower p-values indicate stronger evidence of a significant trend.

The final decision is based on the following approach:

If segmented trends show substantially different Sen’s Slopes compared to the total trend and exhibit stronger statistical significance (lower p-values), segmentation is preferred.
If segmented trends closely resemble the total trend or do not provide a clear statistical advantage, the total trend is used to maintain a simpler and more robust interpretation.

Show code cell source

Hide code cell source

# Calculate trends

def mann_kendall_p(x):
    """Return Mann-Kendall p-value and trend label."""
    if len(x) < 3:
        return np.nan, "no trend"
    res = mk.original_test(x)
    return res.p, res.trend

def significance_stars(p):
    if pd.isna(p):
        return ""
    if p < 0.001:
        return "***"
    elif p < 0.01:
        return "**"
    elif p < 0.05:
        return "*"
    return ""

def significance_traffic_light(p):
    if pd.isna(p):
        return "Unknown"
    if p < 0.05:
        return "Green"
    elif p < 0.10:
        return "Amber"
    return "Red"

def trend_direction(rate_per_decade, p_value, stable_threshold=0.10):
    """
    Classify overall direction.
    stable_threshold is in percentage points of area per decade.
    """
    if pd.isna(rate_per_decade):
        return "Unknown"
    if (not pd.isna(p_value) and p_value >= 0.05) or abs(rate_per_decade) < stable_threshold:
        return "Stable"
    return "Increase" if rate_per_decade > 0 else "Decrease"

def material_segmentation_change(total_direction, total_rate_decade, segment_rows):
    """
    Returns Yes if at least one significant segment changes the conclusion materially.
    Conservative logic:
    - any significant segment with opposite direction to total
    - OR strongest significant segment is >1.5x total magnitude and opposite sign
    """
    if segment_rows.empty:
        return "No"
    
    sig = segment_rows[segment_rows["MK_P_Value"] < 0.05].copy()
    if sig.empty:
        return "No"
    
    for _, r in sig.iterrows():
        seg_dir = r["Direction"]
        if total_direction in ["Increase", "Decrease"] and seg_dir in ["Increase", "Decrease"]:
            if seg_dir != total_direction:
                return "Yes"
    
    return "No"

def confidence_label(n_breakpoints, segmentation_changes, total_p, total_rate_decade):
    """
    Reviewer asks for confidence considering breakpoints and local drivers.
    Since local drivers are not encoded in this table, use a transparent rule:
    - High: significant total trend, no material segmentation change, <=1 breakpoint
    - Medium: significant total trend but segmentation change or multiple breakpoints
    - Low: non-significant total trend or unstable segmented interpretation
    """
    if pd.isna(total_p):
        return "Low"
    if total_p >= 0.05:
        return "Low"
    if segmentation_changes == "No" and n_breakpoints <= 1:
        return "High"
    return "Medium"


# Compute total + segmented trends 

trend_records = []

for region in results_df["Region"].dropna().unique():
    region_data = results_df.loc[results_df["Region"] == region].sort_values("Year").copy()
    X_total = region_data["Year"].values
    y_total = region_data["Urban_Area"].values

    if len(region_data) < 3 or np.isnan(y_total).any():
        continue

    # Total Sen slope
    total_sen_slope, _, _, _ = theilslopes(y_total, X_total)

    # Convert to % area per decade, as requested by reviewer
    mean_total_area = np.mean(y_total)
    total_rate_pct_per_year = (total_sen_slope / mean_total_area) * 100 if mean_total_area != 0 else np.nan
    total_rate_pct_per_decade = total_rate_pct_per_year * 10 if pd.notna(total_rate_pct_per_year) else np.nan

    # Mann-Kendall
    total_p_value, total_mk_trend = mann_kendall_p(y_total)
    total_direction = trend_direction(total_rate_pct_per_decade, total_p_value)

    trend_records.append({
        "Region": region,
        "Type": "Total",
        "Start_Year": int(region_data["Year"].min()),
        "End_Year": int(region_data["Year"].max()),
        "Sen_Slope": round(total_sen_slope, 6),
        "Rate_of_Change_pct_per_year": round(total_rate_pct_per_year, 6) if pd.notna(total_rate_pct_per_year) else np.nan,
        "Rate_of_Change_pct_per_decade": round(total_rate_pct_per_decade, 6) if pd.notna(total_rate_pct_per_decade) else np.nan,
        "MK_P_Value": total_p_value,
        "MK_Trend": total_mk_trend,
        "Direction": total_direction,
        "Significance_Stars": significance_stars(total_p_value),
        "Traffic_Light": significance_traffic_light(total_p_value),
        "Segment_Length": len(region_data)
    })

    # Breakpoints and segmented trends
    region_breakpoints = (
        breakpoints_df.loc[breakpoints_df["Region"] == region, "Year"]
        .dropna()
        .astype(int)
        .tolist()
    )
    bounds = sorted(set([int(region_data["Year"].min())] + region_breakpoints + [int(region_data["Year"].max())]))

    for i in range(len(bounds) - 1):
        start_year, end_year = bounds[i], bounds[i + 1]
        segment_data = region_data[(region_data["Year"] >= start_year) & (region_data["Year"] <= end_year)].copy()

        if len(segment_data) < 3:
            continue

        X_seg = segment_data["Year"].values
        y_seg = segment_data["Urban_Area"].values

        seg_sen_slope, _, _, _ = theilslopes(y_seg, X_seg)

        mean_seg_area = np.mean(y_seg)
        seg_rate_pct_per_year = (seg_sen_slope / mean_seg_area) * 100 if mean_seg_area != 0 else np.nan
        seg_rate_pct_per_decade = seg_rate_pct_per_year * 10 if pd.notna(seg_rate_pct_per_year) else np.nan

        seg_p_value, seg_mk_trend = mann_kendall_p(y_seg)
        seg_direction = trend_direction(seg_rate_pct_per_decade, seg_p_value)

        trend_records.append({
            "Region": region,
            "Type": "Segmented",
            "Start_Year": int(start_year),
            "End_Year": int(end_year),
            "Sen_Slope": round(seg_sen_slope, 6),
            "Rate_of_Change_pct_per_year": round(seg_rate_pct_per_year, 6) if pd.notna(seg_rate_pct_per_year) else np.nan,
            "Rate_of_Change_pct_per_decade": round(seg_rate_pct_per_decade, 6) if pd.notna(seg_rate_pct_per_decade) else np.nan,
            "MK_P_Value": seg_p_value,
            "MK_Trend": seg_mk_trend,
            "Direction": seg_direction,
            "Significance_Stars": significance_stars(seg_p_value),
            "Traffic_Light": significance_traffic_light(seg_p_value),
            "Segment_Length": len(segment_data)
        })

trends_df = pd.DataFrame(trend_records)

The following summary table reports:

Growth rate (%/decade) → long-term rate of urban expansion
Significance → statistical robustness of the full trend
- *** → highly significant (p < 0.001)
- ** → significant (p < 0.01)
- * → moderately significant (p < 0.05)
- no symbol → not statistically significant
Breakpoints → number of structural changes detected
Segmented trend needed? → whether segmented trends change the interpretation
Confidence → degree of agreement between full and segmented trend significance

21 regions were assessed. Segmented trends are not required in 21 of 21 regions. Confidence is High in 21 regions, Medium in 0, and Low in 0.

	Region	Total urban area (km²)	Growth rate (%/decade)	Significance	Breakpoints	Segmented trend needed?	Confidence
0	Alentejo	202	57.2	***	2	No	High
1	Algarve	128	50.8	***	2	No	High
2	Aragón	206	49.0	***	1	No	High
3	Centro (PT)	704	48.6	***	2	No	High
4	Comunidad Foral de Navarra	117	48.0	***	2	No	High
5	Región de Murcia	280	47.2	***	2	No	High
6	Castilla y León	534	45.3	***	3	No	High
7	Castilla-La Mancha	527	44.9	***	2	No	High
8	Comunitat Valenciana	1080	44.1	***	2	No	High
9	La Rioja	53	43.5	***	1	No	High
10	Extremadura	223	36.6	***	1	No	High
11	Andalucía	1374	35.4	***	1	No	High
12	Illes Balears	170	33.6	***	2	No	High
13	Comunidad de Madrid	983	33.3	***	0	No	High
14	Cataluña	1279	32.4	***	3	No	High
15	Cantabria	110	31.1	***	2	No	High
16	Galicia	479	30.8	***	2	No	High
17	Norte	1118	29.4	***	1	No	High
18	Área Metropolitana de Lisboa	571	25.5	***	1	No	High
19	Principado de Asturias	166	23.5	***	1	No	High
20	País Vasco	272	21.6	***	3	No	High

Trend Analysis#

The results demonstrates that, although breakpoints were identified, they do not substantially alter the underlying trend estimates. In all cases, the total trend was preferred, as segmented alternatives did not exhibit significantly stronger slopes or lower p-values.

This suggests that the dataset provides a coherent temporal structure for monitoring urban land cover change. While localised variations exist, the broader trend of urban expansion remains largely continuous and statistically stable.

Furthermore, the reliability of the total trend is evident even in regions with marked breakpoint clusters, such as Galicia, Andalucía, and Región de Murcia. Despite temporary changes in slope magnitude, the segmented trends did not offer a statistically significant improvement over the total series, suggesting that short-term accelerations or decelerations are well-integrated into the full-period estimate.

Geographical characteristics modulate this effect. Larger metropolitan areas or regions with long-standing urban infrastructure exhibit smoother temporal evolution, while smaller or emerging urban zones may reflect sharper shifts in land cover classification. Nevertheless, these fluctuations did not systematically translate into divergent trend recommendations.

In summary, the analysis confirms that the dataset is suitable for deriving consistent urbanisation trends across multi-decadal periods. Segmentation remains a valuable diagnostic tool for identifying local deviations, but for most spatial planning and land management applications, the total trend offers a statistically supported and methodologically stable representation of urban growth dynamics.

5. Discussion#

The results of this analysis demonstrate the suitability of the dataset for assessing long-term urbanisation trends. Despite the presence of breakpoints the overall consistency of the dataset supports its use for reliable trend estimation in urban land cover.

For spatial planning and land management applications, this temporal continuity is critical. The ability to trace urban expansion over a multi-decadal period enables stakeholders to evaluate the spatial footprint of urban growth, assess policy impacts, and inform future development strategies. The dataset captures the general trajectory of urbanisation effectively, even in the presence of short-term fluctuations or classification refinements.

Nonetheless, the analysis highlights the importance of methodological awareness. Some of the breakpoints occurre around years with known changes in the processing chain. These may introduce artificial discontinuities in urban extent, particularly in regions undergoing rapid development or where classification thresholds are more sensitive to spectral or spatial resolution. Such effects must be considered when interpreting apparent changes in urban dynamics.

Geographic and demographic context also influence data interpretation. Larger metropolitan regions, such as Madrid or Lisbon, tend to exhibit smoother trends, as broader patterns of growth absorb minor classification inconsistencies. In contrast, smaller or fast-developing regions may show more pronounced breaks, either due to real shifts in urbanisation or amplified sensitivity to methodological updates.

From a practical standpoint, the dataset supports robust urban trend detection, but effective application requires a cautious and informed approach. Users should:

Consider the potential influence of sensor transitions and processing updates on detected breakpoints
Prioritise total trends for general assessments, while using segmentation to highlight context-specific deviations
Interpret statistical outputs alongside relevant contextual knowledge, such as infrastructure investments, economic cycles, or policy milestones.

In conclusion, the dataset provides a strong foundation for monitoring urbanisation trends. Its temporal stability and spatial resolution are appropriate for regional-scale planning.

ℹ️ If you want to know more#

Key Resources#

The CDS catalogue entry for the data used was Land cover classification gridded maps from 1992 to present derived from satellite observations
Product User Guide and Specification of the dataset version 2.1 and version 2.0
Eurostat NUTS (Nomenclature of territorial units for statistics)
C3S EQC custom functions, c3s_eqc_automatic_quality_control, prepared by B-Open

References#

Vargo, J., Habeeb, D. M., & Stone, B. (2013). The importance of land cover change across urban-rural typologies for climate modeling. Journal of Environmental Management, 114, 243-252. https://doi.org/10.1016/j.jenvman.2012.10.007

Chang, Y., Hou, K., Li, X., Zhang, Y., & Chen, P. (2018). Review of land use and land cover change research progress. IOP Conference Series: Earth and Environmental Science, 113, 012087. https://doi.org/10.1088/1755-1315/113/1/012087

Chelali, A., Benediktsson, J. A., Chanussot, J., & Akbari, V. (2019). A review of artificial intelligence techniques for land cover change detection using remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 57(10), 7649–7668. https://doi.org/10.1109/JURSE.2019.8808967

Delgado, L., & Romero, J. (2021). Evaluating the impact of Madrid Central on urban mobility: A step toward sustainable cities? Cities, 113, 103120. https://doi.org/10.1016/j.cities.2021.103120

González, M. J., & Leal, J. (2018). Urban growth and real estate development in the Iberian Peninsula: The boom of 1996–2007. Land Use Policy, 76, 648–657. https://doi.org/10.1016/j.landusepol.2018.05.023

Gutiérrez, F., Rodríguez, V., & Prieto, C. (2016). Aging population and urbanization: Implications for Spain’s demographic landscape. Journal of Rural Studies, 45, 85–98. https://doi.org/10.1016/j.jrurstud.2016.03.008

López, F. A., Martín, J. C., & Albalate, D. (2017). High-speed rail and urban expansion in the Iberian Peninsula: The cases of AVE and Alfa Pendular. Journal of Transport Geography, 60, 57–68. https://doi.org/10.1016/j.jtrangeo.2017.02.005

Martínez, P., & García, R. (2015). The impact of the 2008 financial crisis on urban expansion in southern Spain. Habitat International, 47, 42–50. https://doi.org/10.1016/j.habitatint.2015.01.010

Palmero-Iniesta, H., Fernández, J. A., & Olcina, J. (2021). Land-use changes and natural land recovery after the 2008 crisis: Evidence from Spain. Environmental Science & Policy, 116, 72–83. https://api.semanticscholar.org/CorpusID:238829360

Rullan, O. (2014). The impact of tourism-driven urbanisation in Spain and Portugal: Coastal transformations and land use. Land Use Policy, 38, 139–150. https://doi.org/10.1016/j.landusepol.2014.02.009

Sánchez, M., Pérez, L., & Torres, D. (2019). Rural depopulation and urban migration trends in Spain and Portugal. Geoforum, 104, 78–91. https://doi.org/10.1016/j.geoforum.2019.06.015

Silva, E., & Fernandes, J. (2019). The influence of urban planning policies on land prices and informal settlements in Iberian cities. Land Use Policy, 89, 104200. https://doi.org/10.1016/j.landusepol.2019.04.012

Silva, V., Santos, R., & Moreira, J. (2017). The expansion of peri-urban areas in Lisbon and Porto: Drivers and consequences. Cities, 66, 101–112. https://doi.org/10.1016/j.cities.2017.04.002

Satellite Land Cover trend assessment for Spatial Planning and Land Management

Contents

1.4.4. Satellite Land Cover trend assessment for Spatial Planning and Land Management#

🌍 Use case: Using land cover products to quantify urbanisation trends#

❓ Quality assessment question#

📢 Quality assessment statement#

📋 Methodology#

📈 Analysis and results#

1. Define the AoI, search and download LC data#

Install CDS API#

Import all the libraries/packages#

Data Overview#

2. Inspect and view data for the defined AoI (Iberian Peninsula)#

Compute urban area for each NUTS 2 region#

Mask regions#

Compute cell area#

Select Urban Classes and Prepare Dataset#

Compute urban area per region and year#

Map urban percentage coverage over-time by NUTS regions in the AoI#

Map Analysis#

Potential Drivers and Methodological Considerations in Urbanisation Trends in the Iberian Peninsula (1992–2022)#

3. Breakpoint Detection#

Breakpoint Analysis#

4.Trend Assessment#

Trend Analysis#

5. Discussion#

ℹ️ If you want to know more#

Key Resources#

References#