This repository contains the code for building the benchmark CanadaFireSat. In this benchmark, we investigate the potential of deep learning with multiple sensors for high-resolution wildfire forecasting.
- 💿 Dataset on Hugging Face
- 📝 Paper on ArXiv
- 🤖 Model repository on GitHub & Weights on Hugging Face
Summary Representation:
In this section, we describe the different sources necessary to build the CanadaFireSat benchmark.
- 💻 National Burned Area Composite (NBAC 🇨🇦): Polygons Shapefile downloaded from CWFIS Datamart
- 📅 Filter fires since 2015 aligning with Sentinel-2 imagery availability
- 🛑 No restrictions are applied on ignition source or other metadata
- ➕ Spatial aggregation: Fires are mapped to a 2.8 km × 2.8 km grid | Temporal aggregation into 8-day windows
- 🛰️ Sentinel-2 (S2) Level-1C Satellite Imagery (2015–2023) from Google Earth Engine
- 🗺️ For each grid cell (2.8 km × 2.8 km): Collect cloud-free S2 images (≤ 40% cloud cover) over a 64-day period before prediction
⚠️ We discard samples with: Fewer than 3 valid images | Less than 40 days of coverage
- 🌡️ Hydrometeorological Drivers: Key variables like temperature, precipitation, soil moisture, and humidity from ERA5-Land (11 km, available on Google Earth Engine) and MODIS11 (1 km, available on Google Earth Engine), aggregated over 8-day windows using mean, max, and min values.
- 🌿 Vegetation Indices (MODIS13 and MODIS15): NDVI, EVI, LAI, and FPAR (500 m) captured in 8 or 16-day composites, informing on vegetation state.
- 🔥 Fire Danger Metrics (CEMS previously on CDS): Fire Weather Index and Drought Code from the Canadian FWI system (0.25° resolution).
- 🕒 For each sample, we gather predictor data from 64 days prior, to reflect pre-fire conditions.
- ⛔️ Exclusively used for adversarial sampling and post-training analysis.
- 💾 Data extracted is the 2020 North American Land Cover 30-meter dataset, produced as part of the North American Land Change Monitoring System (NALCMS) (available on Google Earth Engine)
In order to run the pipeline steps below, you will need a Google Account and run the cells in notebooks/ee_test.ipynb
to get the Earth Engine token.
Then, you also need to install the Python virtual environment:
python -m venv data-env
source data-env/bin/activate
pip install -r requirements/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117
Create Grid of Positive Samples:
src.preprocess.create_grid
: Initialize the spatial grid over Canada | Config: Nonesrc.preprocess.burned_area
: Preprocess NBAC input data and aggregate spatially and temporally the fire polygons | Config:ba_preprocess.yaml
src.preprocess.temporal_freq
: Temporal aggregation over 8-day window of the positive samples | Config:ba_preprocess.yaml
Download MODIS and ERA5 Data:
src.env_download
: Download from EE complete tiles over Canada for each date on your DRIVE | Configs:era5.yaml
&modis.yaml
- Manually copy the GeoTiffs from your DRIVE to your local machine.
Download FWI Data:
src.cds_download
: Download from CEMS (previously CDS) 2015 - 2022 consolidated data and in 2023 intermediate data | Config:cds.yaml
Download Land Cover Data from EE:
- Manually download the 2020 GeoTiff only and merge the tiff via:
src.postprocess.env_vals
| Config: None
Postprocess MODIS, ERA5, FWI, and Land Cover Data:
src.postprocess.env_vals
: Postprocess the environmental predictors for extreme or unknown values | Config: None
Create Negative Samples:
src.sampling.negative
: Sample the negatives for Train, Validation, and Test splits | Config:sampling.yaml
src.sampling.negative_hard
: Sample the negatives for the Test Hard split | Config:samling_hard.yaml
Download S2 Data:
src.download
: This script needs to be run for the positive samples, negative, and negative hard (usually run per-region) | Config:ba_s2.yaml
Postprocess S2 Data:
src.postprocess.s2_post
: Post-processing of the S2 tiles based on cloud cover and filtering of time series containing not enough images or covering not enough days | Config:postprocess.yaml
Compute S2 Bands Statistics:
src.postprocess.band_stats
: Compute mean and std of each band of the positive and negative samples | Config:stats.yaml
Rasterize Label Polygons:
src.postprocess.rasterize
: Rasterized the fire polygons in binary arrays using the S2 GeoTiffs as reference | Config:rasterize.yaml
Aligned Environmental Variables with S2 Tiles:
src.postprocess.spatial_alignment
: Spatially aligned the environmental predictors by extracting windows centered around S2 tiles | Config:spatial_alignment.yaml
&spatial_alignment_lc.yaml
src.postprocess.alignment
: Weighted average mean of the environmental predictors over the S2 tiles | Config:alignment.yaml
Compute Environment Variables Statistics:
src.postprocess.env_stats
: Compute the mean and std of each environment variable on the positive and negative sample population | Config:env_stats.yaml
Create the split file:
src.postprocess.split
: Create the main split file for model training and evaluation | Config:split.yaml
Transform SITS GeoTiff to npy files
src.postprocess.transform
: Extract all the Sentinel-2 bands GeoTiff and concatenate in groups of npy files per-resolution | Config:transform.yaml
Upload to Hugging Face 🤗
src.huggingface.upload
: Upload CanadaFireSat to HuggingFace and the metadata files | Config:upload.yaml
&manual-upload.yaml
📊 CanadaFireSat Dataset Statistics (without Test Hard):
Statistic | Value |
---|---|
Total Samples | 177,801 |
Target Spatial Resolution | 100 m |
Region Coverage | Canada |
Temporal Coverage | 2016 - 2023 |
Sample Area Size | 2.64 km × 2.64 km |
Fire Occurrence Rate | 39% of samples |
Total Fire Patches | 16% of patches |
Training Set (2016–2021) | 78,030 samples |
Validation Set (2022) | 14,329 samples |
Test Set (2023) | 85,442 samples |
Sentinel-2 Temporal Median Coverage | 55 days (8 images) |
Number of Environmental Predictors | 58 |
Data Sources | ERA5, MODIS, CEMS |
📍 Samples Localisation:
Figure 1: Spatial distribution of positive (left) and negative (right) wildfire samples.
🛰️ Example of S2 time series:
Figure 2: Row 1-3 Samples of Sentinel-2 input time series for 4 locations in Canada, with only the RGB bands with rescaled intensity. Row 4 Sentinel-2 images after the fire occurred. Row 5 Fire polygons used as labels with the Sentinel-2 images post-fire.
@article{porta2025canadafiresat,
title={CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities},
author={Porta, Hugo and Dalsasso, Emanuele and McCarty, Jessica L and Tuia, Devis},
journal={arXiv preprint arXiv:2506.08690},
year={2025}
}