🔥🛰️ CanadaFireSat Data

This repository contains the code for building the benchmark CanadaFireSat. In this benchmark, we investigate the potential of deep learning with multiple sensors for high-resolution wildfire forecasting.

💿 Dataset on Hugging Face
📝 Paper on ArXiv
🤖 Model repository on GitHub & Weights on Hugging Face

Summary Representation:

Sources

In this section, we describe the different sources necessary to build the CanadaFireSat benchmark.

🔥📍 Fire Polygons Source

💻 National Burned Area Composite (NBAC 🇨🇦): Polygons Shapefile downloaded from CWFIS Datamart
📅 Filter fires since 2015 aligning with Sentinel-2 imagery availability
🛑 No restrictions are applied on ignition source or other metadata
➕ Spatial aggregation: Fires are mapped to a 2.8 km × 2.8 km grid | Temporal aggregation into 8-day windows

🛰️🗺️ Satellite Image Time Series Source

🛰️ Sentinel-2 (S2) Level-1C Satellite Imagery (2015–2023) from Google Earth Engine
🗺️ For each grid cell (2.8 km × 2.8 km): Collect cloud-free S2 images (≤ 40% cloud cover) over a 64-day period before prediction
⚠️ We discard samples with: Fewer than 3 valid images | Less than 40 days of coverage

🌦️🌲 Environmental Predictors

🌡️ Hydrometeorological Drivers: Key variables like temperature, precipitation, soil moisture, and humidity from ERA5-Land (11 km, available on Google Earth Engine) and MODIS11 (1 km, available on Google Earth Engine), aggregated over 8-day windows using mean, max, and min values.
🌿 Vegetation Indices (MODIS13 and MODIS15): NDVI, EVI, LAI, and FPAR (500 m) captured in 8 or 16-day composites, informing on vegetation state.
🔥 Fire Danger Metrics (CEMS previously on CDS): Fire Weather Index and Drought Code from the Canadian FWI system (0.25° resolution).
🕒 For each sample, we gather predictor data from 64 days prior, to reflect pre-fire conditions.

🏞️ Land Cover

⛔️ Exclusively used for adversarial sampling and post-training analysis.
💾 Data extracted is the 2020 North American Land Cover 30-meter dataset, produced as part of the North American Land Change Monitoring System (NALCMS) (available on Google Earth Engine)

🛠️ Set-Up

In order to run the pipeline steps below, you will need a Google Account and run the cells in notebooks/ee_test.ipynb to get the Earth Engine token.

Then, you also need to install the Python virtual environment:

python -m venv data-env
source data-env/bin/activate
pip install -r requirements/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

🪜 Pipeline Steps

Create Grid of Positive Samples:

src.preprocess.create_grid: Initialize the spatial grid over Canada | Config: None
src.preprocess.burned_area: Preprocess NBAC input data and aggregate spatially and temporally the fire polygons | Config: ba_preprocess.yaml
src.preprocess.temporal_freq: Temporal aggregation over 8-day window of the positive samples | Config: ba_preprocess.yaml

Download MODIS and ERA5 Data:

src.env_download: Download from EE complete tiles over Canada for each date on your DRIVE | Configs: era5.yaml & modis.yaml
Manually copy the GeoTiffs from your DRIVE to your local machine.

Download FWI Data:

src.cds_download: Download from CEMS (previously CDS) 2015 - 2022 consolidated data and in 2023 intermediate data | Config: cds.yaml

Download Land Cover Data from EE:

Manually download the 2020 GeoTiff only and merge the tiff via: src.postprocess.env_vals| Config: None

Postprocess MODIS, ERA5, FWI, and Land Cover Data:

src.postprocess.env_vals: Postprocess the environmental predictors for extreme or unknown values | Config: None

Create Negative Samples:

src.sampling.negative: Sample the negatives for Train, Validation, and Test splits | Config: sampling.yaml
src.sampling.negative_hard: Sample the negatives for the Test Hard split | Config: samling_hard.yaml

Download S2 Data:

src.download: This script needs to be run for the positive samples, negative, and negative hard (usually run per-region) | Config: ba_s2.yaml

Postprocess S2 Data:

src.postprocess.s2_post: Post-processing of the S2 tiles based on cloud cover and filtering of time series containing not enough images or covering not enough days | Config: postprocess.yaml

Compute S2 Bands Statistics:

src.postprocess.band_stats: Compute mean and std of each band of the positive and negative samples | Config: stats.yaml

Rasterize Label Polygons:

src.postprocess.rasterize: Rasterized the fire polygons in binary arrays using the S2 GeoTiffs as reference | Config: rasterize.yaml

Aligned Environmental Variables with S2 Tiles:

src.postprocess.spatial_alignment: Spatially aligned the environmental predictors by extracting windows centered around S2 tiles | Config: spatial_alignment.yaml & spatial_alignment_lc.yaml
src.postprocess.alignment: Weighted average mean of the environmental predictors over the S2 tiles | Config: alignment.yaml

Compute Environment Variables Statistics:

src.postprocess.env_stats: Compute the mean and std of each environment variable on the positive and negative sample population | Config: env_stats.yaml

Create the split file:

src.postprocess.split: Create the main split file for model training and evaluation | Config: split.yaml

Transform SITS GeoTiff to npy files

src.postprocess.transform: Extract all the Sentinel-2 bands GeoTiff and concatenate in groups of npy files per-resolution | Config: transform.yaml

Upload to Hugging Face 🤗

src.huggingface.upload: Upload CanadaFireSat to HuggingFace and the metadata files | Config: upload.yaml & manual-upload.yaml

📷 Outputs

📊 CanadaFireSat Dataset Statistics (without Test Hard):

Statistic	Value
Total Samples	177,801
Target Spatial Resolution	100 m
Region Coverage	Canada
Temporal Coverage	2016 - 2023
Sample Area Size	2.64 km × 2.64 km
Fire Occurrence Rate	39% of samples
Total Fire Patches	16% of patches
Training Set (2016–2021)	78,030 samples
Validation Set (2022)	14,329 samples
Test Set (2023)	85,442 samples
Sentinel-2 Temporal Median Coverage	55 days (8 images)
Number of Environmental Predictors	58
Data Sources	ERA5, MODIS, CEMS

📍 Samples Localisation:

Figure 1: Spatial distribution of positive (left) and negative (right) wildfire samples.

🛰️ Example of S2 time series:

Figure 2: Row 1-3 Samples of Sentinel-2 input time series for 4 locations in Canada, with only the RGB bands with rescaled intensity. Row 4 Sentinel-2 images after the fire occurred. Row 5 Fire polygons used as labels with the Sentinel-2 images post-fire.

🖋️ Citation

@article{porta2025canadafiresat,
  title={CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities},
  author={Porta, Hugo and Dalsasso, Emanuele and McCarty, Jessica L and Tuia, Devis},
  journal={arXiv preprint arXiv:2506.08690},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
images		images
notebooks		notebooks
requirements		requirements
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥🛰️ CanadaFireSat Data

Sources

🔥📍 Fire Polygons Source

🛰️🗺️ Satellite Image Time Series Source

🌦️🌲 Environmental Predictors

🏞️ Land Cover

🛠️ Set-Up

🪜 Pipeline Steps

📷 Outputs

🖋️ Citation

About

Uh oh!

Releases

Packages

Languages

License

eceo-epfl/CanadaFireSat-Data

Folders and files

Latest commit

History

Repository files navigation

🔥🛰️ CanadaFireSat Data

Sources

🔥📍 Fire Polygons Source

🛰️🗺️ Satellite Image Time Series Source

🌦️🌲 Environmental Predictors

🏞️ Land Cover

🛠️ Set-Up

🪜 Pipeline Steps

📷 Outputs

🖋️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages