Skip to content

Commit

Permalink
progress on docs -- specifically input datasets (#219)
Browse files Browse the repository at this point in the history
* progress on docs -- specifically input datasets

* updated downscaling methods and inputs datasets

* nit: spellcheck

* edits to running-flows
  • Loading branch information
norlandrhagen authored Jun 30, 2022
1 parent 1170236 commit 32bf701
Show file tree
Hide file tree
Showing 4 changed files with 96 additions and 16 deletions.
20 changes: 17 additions & 3 deletions docs/pages/cmip6-downscaling/downscaling-methods.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
import Section from '../../components/section'

# Downscaling Methods
## Downscaling methods

Text here...
We implemented four downscaling methods globally. Descriptions of these implementations are below, along with references to further information. Our [explainer article](https://carbonplan.org/research/cmip6-downscaling-explainer) discusses the importance of downscaling, and describes some of the key methodological differences, in more detail.

export default ({ children }) => <Section name='Downscaling Methods'>{children}</Section>
### MACA

The Multivariate Adaptive Constructed Analogs method [(Abatzoglou and Brown, 2012)](https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/joc.2312) finds common spatial patterns among GCM and reference datasets to construct downscaled future projections from actual weather patterns from the past. The method involves a combination of coarse and fine-scale bias-correction, detrending of GCM data, and analog selection, steps which are detailed thoroughly in the [MACA Datasets documentation](https://climate.northwestknowledge.net/MACA/MACAmethod.php). MACA is designed to operate at the regional scale. As a result, we split the global domain into smaller regions using the AR6 delineations from the `regionmask` [package](https://regionmask.readthedocs.io/en/stable/) and downscaled each region independently. We then stitched the regions back together to create a seamless global product. Of the methods we have implemented, MACA is the most established.

### GARD-SV

The Generalized Analog Regression Downscaling (GARD) (Guttmann et al., in review) approach is a downscaling sandbox that allows scientists to create custom downscaling implementations, supporting single or multiple predictor variables, pure regression and pure analog approaches, and different bias-correction routines. At its core, GARD builds a linear model for every pixel relating the reference dataset at the fine-scale to the same data coarsened to the scale of the GCM. The downscaled projections are then further perturbed by spatially-correlated random fields to reflect the error in the regression models. Our GARD-SV (single-variate) implementation uses the same variable for training and prediction (e.g. precipitation is the only predictor for downscaling precipitation). For regression, we used the PureRegression method, building a single model for each pixel from the entire timeseries of training data. The precipitation model included a logistic regression component, with a threshold of 0 mm/day for constituting a precipitation event.

### GARD-MV

The GARD-MV (multi-variate) implementation follows the same process as the GARD-SV method but uses multiple predictor variables for model training and inference. Specifically, we used three predictors for each downscaling model, adding the two directions of 500mb winds to each model. Thus, the predictors for precipitation in this model are precipitation, longitudinal wind, and latitudinal wind.

### DeepSD

DeepSD uses a computer vision approach to learn spatial patterns at multiple resolutions [Vandal et al., 2017](https://dl.acm.org/doi/10.1145/3097983.3098004). Specifically, DeepSD is a stacked super-resolution convolutional neural network. We adapted the [open-source DeepSD implementation](https://github.com/tjvandal/deepsd) for downscaling global ensembles by updating the source code for Python 3 and TensorFlow2, removing the batch normalization layer, normalizing based on historical observations, training models for temperature and precipitation, and training on a global reanalysis product (ERA5). In addition, we trained the model for fewer iterations than in Vandal et al., 2017 and clipped aphysical precipitation values at 0. Our dataset includes an additional bias-corrected product (DeepSD-BC). Given its origin in deep learning, this method is the most different from those included here, and is an experimental contribution to our dataset.
37 changes: 36 additions & 1 deletion docs/pages/cmip6-downscaling/input-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,41 @@ import Section from '../../components/section'

# Input Datasets

Text here...
..# Methods

## CMIP6 raw datasets

The downscaled datasets shown here are derived from results from the [Coupled Model Intercomparison Project Phase 6](https://doi.org/10.5194/gmd-9-1937-2016). Raw datasets are also available in the web catalog (labeled “Raw”). GCMs are run at different spatial resolutions, and the data presented here are displayed in their original spatial resolution. The raw CMIP6 datasets were accessed via the Pangeo data catalog.

## Reference dataset

All downscaled datasets here were trained on the [ERA5](https://doi.org/10.1002/qj.3803) global reanalysis product at the 0.25 degree (~25 km) spatial resolution. All downscaling methods used daily temperature maxima and minima and precipitation for the period 1981-2010. One algorithm (GARD-MV) also used wind.

## Reference dataset

### ERA5 Reanalysis

All downscaled datasets here were trained on the [ERA5](https://doi.org/10.1002/qj.3803) global reanalysis product at the 0.25 degree (~25 km) spatial resolution. All downscaling methods used daily temperature maxima and minima and precipitation for the period 1981-2010. One algorithm (GARD-MV) also used wind.

This dataset can be accessed/explored via an [intake](https://intake-esm.readthedocs.io/en/stable/) catalog.

```python
# !pip install intake-esm

import intake
cat = intake.open_esm_datastore("https://cmip6downscaling.blob.core.windows.net/training/ERA5-azure.json")

```

The ERA5 data [transfering](https://github.com/carbonplan/cmip6-downscaling/blob/4bf65c61f7192908cca81fe94cda3b94931586f0/flows/ERA5/ERA5_transfer.py) and [processing](https://github.com/carbonplan/cmip6-downscaling/blob/4bf65c61f7192908cca81fe94cda3b94931586f0/flows/ERA5/ERA5_resample.py) scripts can be found on [Github](https://github.com/carbonplan/cmip6-downscaling).

The ECMWF ERA5 dataset was produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) and is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
https://apps.ecmwf.int/datasets/licences/general/

## CMIP6 raw datasets

The downscaled datasets shown here are derived from results from the [Coupled Model Intercomparison Project Phase 6](https://doi.org/10.5194/gmd-9-1937-2016). Raw datasets are also available in the web catalog (labeled “Raw”). GCMs are run at different spatial resolutions, and the data presented here are displayed in their original spatial resolution. The raw CMIP6 datasets were accessed via the Pangeo data catalog.

The transfer script can be found [here](https://github.com/carbonplan/cmip6-downscaling/blob/main/flows/cmip6_transfer.py)

export default ({ children }) => <Section name='Input Datasets'>{children}</Section>
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,4 @@ import Section from '../../components/section'

# Intro to Statistical Downscaling

Text here...

export default ({ children }) => <Section name='Intro to Statistical Downscaling'>{children}</Section>
53 changes: 43 additions & 10 deletions docs/pages/cmip6-downscaling/running-flows.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import Section from '../../components/section'

# Running Prefect Flows

## Why Prefect
# Running Flows

In this project each downscaling method [BCSD, GARD, MACA] has it's own workflow for generating results. These data production workflows are handled by the python library, prefect, which encapsulates the data processing steps into individual tasks, which are organized into a 'Flow'.

Expand All @@ -12,23 +10,21 @@ Prefect allows us to run these downscaling flows with many different parameter c

Prefect has the ability to run flows with different `runtimes`. Choosing the correct runtime can be crucial to help with scaling multiple flows or debugging a single issue.

Pre-configured runtimes are stored in [`runtimes.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/runtimes.py)
Pre-configured runtimes are stored in [`cmip6_downscaling.runtimes.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/runtimes.py)

The current runtime options are:

[`cloud`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L57) `executor: dask-distrubted` - Runtime for queing multiple flows on prefect cloud.
[`cloud`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L57) `executor: dask-distrubted` - Runtime for queuing multiple flows on prefect cloud.

[`local`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L113) `executor: local` - Runtime for developing on local machine

[`CI`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L130) `executor: local` - Runtime used for Continuous Integration

[`pangeo`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L140) `executor: dask-distrubted` - Runtime for processesing on jupyter-hub

[`gateway`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L165) `executor: dask-distrubted` - Runtime used for scaling with a dask-gateway cluser
[`pangeo`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L140) `executor: dask-distrubted` - Runtime for processing on jupyter-hub

## Modifying Flow Config

Project level configuration settings are in [`config.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/config.py) and configured using the python package [`donfig`](https://donfig.readthedocs.io/en/latest/). Default configuration options can be overwritten in multiple ways with donfig. Below are two options for specifying use of the cloud runtime.
Project level configuration settings are in [`cmip6_downscaling.config.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/config.py) and configured using the python package [`donfig`](https://donfig.readthedocs.io/en/latest/). Default configuration options can be overwritten in multiple ways with donfig. Below are two options for specifying use of the cloud runtime. Note: any `connection_strings` or other sensitive information is best stored in a local .yaml or as an environment variable.

#### Python Context

Expand Down Expand Up @@ -58,6 +54,43 @@ Config options can also be set with specifically formatted environment variables
[environment variables](https://donfig.readthedocs.io/en/latest/configuration.html#environment-variables)
## Parameter Files
All downscaling flows require run parameters to be passed in as a `.json file`. These parameter files contain arguments to the flows, specifying which downscaling method, which variable etc. Example config files can be found in `cmip6_downscaling.configs.generate_valid_configs.<method>`. Future configs can be generated manually or using the notebook template [generate_valid_json_parameters.ipynb](https://github.com/carbonplan/cmip6-downscaling/blob/main/configs/generate_valid_configs/generate_valid_json_parameters.ipynb).

Example config file:

```json
{
"method": "gard",
"obs": "ERA5",
"model": "BCC-CSM2-MR",
"member": "r1i1p1f1",
"grid_label": "gn",
"table_id": "day",
"scenario": "historical",
"features": ["pr"],
"variable": "pr",
"latmin": "-90",
"latmax": "90",
"lonmin": "-180",
"lonmax": "180",
"bias_correction_method": "quantile_mapper",
"bias_correction_kwargs": {
"pr": { "detrend": false },
"tasmin": { "detrend": true },
"tasmax": { "detrend": true },
"psl": { "detrend": false },
"ua": { "detrend": false },
"va": { "detrend": false }
},
"model_type": "PureRegression",
"model_params": { "thresh": 0 },
"train_dates": ["1981", "2010"],
"predict_dates": ["1950", "2014"]
}
```

## Runtimes

### Cloud
Expand All @@ -72,7 +105,7 @@ While this runtime excels at resource scaling and parallel runs, debugging with

#### Registering a Flow

With the prefect cloud runtime selected, flows can be registered and ran with the prefect [CLI](https://docs.prefect.io/orchestration/concepts/cli.html).
With the prefect cloud runtime selected, flows can be registered and run with the prefect [CLI](https://docs.prefect.io/orchestration/concepts/cli.html).

To register a flow:

Expand Down

1 comment on commit 32bf701

@vercel
Copy link

@vercel vercel bot commented on 32bf701 Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.