Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

How we train an XGBoost Model for PoF

PoF uses XGBoost, a powerful gradient-boosted decision-tree algorithm widely applied to tabular environmental data.

The training procedure follows a simple, reproducible workflow based on a probabilistic classifier:

Prepare the training dataset

You will assemble a table where each row represents a gridcell in space and time (daily & 9km grid) and includes:


Predictors (features)
Fuel variables, meteorological variables, and ignition proxies. All of which are described in the retrieving_data documentation.

Target (label)
Binary fire occurrence within the gridcell on the given day, where a single or multiple counts equate to fire detection: 1 = fire detected, 0 = no fire.

The data generation script should have synthesised your data into a DataFrame stored in a Parquet file, which is now ready for training.

Split the data


Dataset Splits
  • Training set → used to fit the model

  • Test set → final skill evaluation

We typically use a random stratified split.

Define the XGBoost model

We configure the key parameters:


XGBoost Hyperparameters
  • max_depth – tree complexity

  • learning_rate – how fast the model learns

  • n_estimators – number of boosting rounds

  • objective="binary:logistic" – required to output probabilities

You will need to adjust these parameters depending on your region and data volume.

Generate PoF predictions

Once trained, the model outputs a probability between 0 and 1 representing the likelihood that at least one fire will occur under the given conditions withing a gridcell on a given day.

These are the core PoF predictions you will visualise and evaluate.

Evaluate the model

We can assess the model skill using tools such as: ROC curve AUC score Reliability diagrams Confusion matrix

However we only provide examples for a subset of these metrics. Reliability methods are highly recommended given the unbalanced nature of PoF, however it is important to note these can be more computationally expensive than some other methods.

Interpretation of these diagnostics in the context of fire risk needs to be carefully considered.

The script saves the trained model for reuse (POF_model.joblib) . Later versions of XGboost may save as json or other file formats but can be saved and reused in the same or similar way.


🎯 Final result

A trained, validated XGBoost model providing daily probability-of-fire estimates from environmental predictors