Skip to content

eceo-epfl/CanadaFireSat-Data

Repository files navigation

🔥🛰️ CanadaFireSat Data

License Python Version Datasets on Hugging Face

This repository contains the code for building the benchmark CanadaFireSat. In this benchmark, we investigate the potential of deep learning with multiple sensors for high-resolution wildfire forecasting.

Summary Representation:

Sources

In this section, we describe the different sources necessary to build the CanadaFireSat benchmark.

🔥📍 Fire Polygons Source

  • 💻 National Burned Area Composite (NBAC 🇨🇦): Polygons Shapefile downloaded from CWFIS Datamart
  • 📅 Filter fires since 2015 aligning with Sentinel-2 imagery availability
  • 🛑 No restrictions are applied on ignition source or other metadata
  • ➕ Spatial aggregation: Fires are mapped to a 2.8 km × 2.8 km grid | Temporal aggregation into 8-day windows

🛰️🗺️ Satellite Image Time Series Source

  • 🛰️ Sentinel-2 (S2) Level-1C Satellite Imagery (2015–2023) from Google Earth Engine
  • 🗺️ For each grid cell (2.8 km × 2.8 km): Collect cloud-free S2 images (≤ 40% cloud cover) over a 64-day period before prediction
  • ⚠️ We discard samples with: Fewer than 3 valid images | Less than 40 days of coverage

🌦️🌲 Environmental Predictors

  • 🌡️ Hydrometeorological Drivers: Key variables like temperature, precipitation, soil moisture, and humidity from ERA5-Land (11 km, available on Google Earth Engine) and MODIS11 (1 km, available on Google Earth Engine), aggregated over 8-day windows using mean, max, and min values.
  • 🌿 Vegetation Indices (MODIS13 and MODIS15): NDVI, EVI, LAI, and FPAR (500 m) captured in 8 or 16-day composites, informing on vegetation state.
  • 🔥 Fire Danger Metrics (CEMS previously on CDS): Fire Weather Index and Drought Code from the Canadian FWI system (0.25° resolution).
  • 🕒 For each sample, we gather predictor data from 64 days prior, to reflect pre-fire conditions.

🏞️ Land Cover

  • ⛔️ Exclusively used for adversarial sampling and post-training analysis.
  • 💾 Data extracted is the 2020 North American Land Cover 30-meter dataset, produced as part of the North American Land Change Monitoring System (NALCMS) (available on Google Earth Engine)

🛠️ Set-Up

In order to run the pipeline steps below, you will need a Google Account and run the cells in notebooks/ee_test.ipynb to get the Earth Engine token.

Then, you also need to install the Python virtual environment:

python -m venv data-env
source data-env/bin/activate
pip install -r requirements/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

🪜 Pipeline Steps

Create Grid of Positive Samples:

  • src.preprocess.create_grid: Initialize the spatial grid over Canada | Config: None
  • src.preprocess.burned_area: Preprocess NBAC input data and aggregate spatially and temporally the fire polygons | Config: ba_preprocess.yaml
  • src.preprocess.temporal_freq: Temporal aggregation over 8-day window of the positive samples | Config: ba_preprocess.yaml

Download MODIS and ERA5 Data:

  • src.env_download: Download from EE complete tiles over Canada for each date on your DRIVE | Configs: era5.yaml & modis.yaml
  • Manually copy the GeoTiffs from your DRIVE to your local machine.

Download FWI Data:

  • src.cds_download: Download from CEMS (previously CDS) 2015 - 2022 consolidated data and in 2023 intermediate data | Config: cds.yaml

Download Land Cover Data from EE:

  • Manually download the 2020 GeoTiff only and merge the tiff via: src.postprocess.env_vals| Config: None

Postprocess MODIS, ERA5, FWI, and Land Cover Data:

  • src.postprocess.env_vals: Postprocess the environmental predictors for extreme or unknown values | Config: None

Create Negative Samples:

  • src.sampling.negative: Sample the negatives for Train, Validation, and Test splits | Config: sampling.yaml
  • src.sampling.negative_hard: Sample the negatives for the Test Hard split | Config: samling_hard.yaml

Download S2 Data:

  • src.download: This script needs to be run for the positive samples, negative, and negative hard (usually run per-region) | Config: ba_s2.yaml

Postprocess S2 Data:

  • src.postprocess.s2_post: Post-processing of the S2 tiles based on cloud cover and filtering of time series containing not enough images or covering not enough days | Config: postprocess.yaml

Compute S2 Bands Statistics:

  • src.postprocess.band_stats: Compute mean and std of each band of the positive and negative samples | Config: stats.yaml

Rasterize Label Polygons:

  • src.postprocess.rasterize: Rasterized the fire polygons in binary arrays using the S2 GeoTiffs as reference | Config: rasterize.yaml

Aligned Environmental Variables with S2 Tiles:

  • src.postprocess.spatial_alignment: Spatially aligned the environmental predictors by extracting windows centered around S2 tiles | Config: spatial_alignment.yaml & spatial_alignment_lc.yaml
  • src.postprocess.alignment: Weighted average mean of the environmental predictors over the S2 tiles | Config: alignment.yaml

Compute Environment Variables Statistics:

  • src.postprocess.env_stats: Compute the mean and std of each environment variable on the positive and negative sample population | Config: env_stats.yaml

Create the split file:

  • src.postprocess.split: Create the main split file for model training and evaluation | Config: split.yaml

Transform SITS GeoTiff to npy files

  • src.postprocess.transform: Extract all the Sentinel-2 bands GeoTiff and concatenate in groups of npy files per-resolution | Config: transform.yaml

Upload to Hugging Face 🤗

  • src.huggingface.upload: Upload CanadaFireSat to HuggingFace and the metadata files | Config: upload.yaml & manual-upload.yaml

📷 Outputs

📊 CanadaFireSat Dataset Statistics (without Test Hard):

Statistic Value
Total Samples 177,801
Target Spatial Resolution 100 m
Region Coverage Canada
Temporal Coverage 2016 - 2023
Sample Area Size 2.64 km × 2.64 km
Fire Occurrence Rate 39% of samples
Total Fire Patches 16% of patches
Training Set (2016–2021) 78,030 samples
Validation Set (2022) 14,329 samples
Test Set (2023) 85,442 samples
Sentinel-2 Temporal Median Coverage 55 days (8 images)
Number of Environmental Predictors 58
Data Sources ERA5, MODIS, CEMS

📍 Samples Localisation:

Positive Samples Negative Samples

Figure 1: Spatial distribution of positive (left) and negative (right) wildfire samples.

🛰️ Example of S2 time series:

Figure 2: Row 1-3 Samples of Sentinel-2 input time series for 4 locations in Canada, with only the RGB bands with rescaled intensity. Row 4 Sentinel-2 images after the fire occurred. Row 5 Fire polygons used as labels with the Sentinel-2 images post-fire.

🖋️ Citation

@article{porta2025canadafiresat,
  title={CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities},
  author={Porta, Hugo and Dalsasso, Emanuele and McCarty, Jessica L and Tuia, Devis},
  journal={arXiv preprint arXiv:2506.08690},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published