progress on docs -- specifically input datasets (#219)

* progress on docs -- specifically input datasets * updated downscaling methods and inputs datasets * nit: spellcheck * edits to running-flows
carbonplan · Jun 30, 2022 · 32bf701 · 32bf701 · vercel · Jun 30, 2022
1 parent 1170236
commit 32bf701
Show file tree

Hide file tree

Showing 4 changed files with 96 additions and 16 deletions.
diff --git a/docs/pages/cmip6-downscaling/downscaling-methods.md b/docs/pages/cmip6-downscaling/downscaling-methods.md
@@ -1,7 +1,21 @@
 import Section from '../../components/section'
 
-# Downscaling Methods
+## Downscaling methods
 
-Text here...
+We implemented four downscaling methods globally. Descriptions of these implementations are below, along with references to further information. Our [explainer article](https://carbonplan.org/research/cmip6-downscaling-explainer) discusses the importance of downscaling, and describes some of the key methodological differences, in more detail.
 
-export default ({ children }) => <Section name='Downscaling Methods'>{children}</Section>
+### MACA
+
+The Multivariate Adaptive Constructed Analogs method [(Abatzoglou and Brown, 2012)](https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/joc.2312) finds common spatial patterns among GCM and reference datasets to construct downscaled future projections from actual weather patterns from the past. The method involves a combination of coarse and fine-scale bias-correction, detrending of GCM data, and analog selection, steps which are detailed thoroughly in the [MACA Datasets documentation](https://climate.northwestknowledge.net/MACA/MACAmethod.php). MACA is designed to operate at the regional scale. As a result, we split the global domain into smaller regions using the AR6 delineations from the `regionmask` [package](https://regionmask.readthedocs.io/en/stable/) and downscaled each region independently. We then stitched the regions back together to create a seamless global product. Of the methods we have implemented, MACA is the most established.
+
+### GARD-SV
+
+The Generalized Analog Regression Downscaling (GARD) (Guttmann et al., in review) approach is a downscaling sandbox that allows scientists to create custom downscaling implementations, supporting single or multiple predictor variables, pure regression and pure analog approaches, and different bias-correction routines. At its core, GARD builds a linear model for every pixel relating the reference dataset at the fine-scale to the same data coarsened to the scale of the GCM. The downscaled projections are then further perturbed by spatially-correlated random fields to reflect the error in the regression models. Our GARD-SV (single-variate) implementation uses the same variable for training and prediction (e.g. precipitation is the only predictor for downscaling precipitation). For regression, we used the PureRegression method, building a single model for each pixel from the entire timeseries of training data. The precipitation model included a logistic regression component, with a threshold of 0 mm/day for constituting a precipitation event.
+
+### GARD-MV
+
+The GARD-MV (multi-variate) implementation follows the same process as the GARD-SV method but uses multiple predictor variables for model training and inference. Specifically, we used three predictors for each downscaling model, adding the two directions of 500mb winds to each model. Thus, the predictors for precipitation in this model are precipitation, longitudinal wind, and latitudinal wind.
+
+### DeepSD
+
+DeepSD uses a computer vision approach to learn spatial patterns at multiple resolutions [Vandal et al., 2017](https://dl.acm.org/doi/10.1145/3097983.3098004). Specifically, DeepSD is a stacked super-resolution convolutional neural network. We adapted the [open-source DeepSD implementation](https://github.com/tjvandal/deepsd) for downscaling global ensembles by updating the source code for Python 3 and TensorFlow2, removing the batch normalization layer, normalizing based on historical observations, training models for temperature and precipitation, and training on a global reanalysis product (ERA5). In addition, we trained the model for fewer iterations than in Vandal et al., 2017 and clipped aphysical precipitation values at 0. Our dataset includes an additional bias-corrected product (DeepSD-BC). Given its origin in deep learning, this method is the most different from those included here, and is an experimental contribution to our dataset.
diff --git a/docs/pages/cmip6-downscaling/input-datasets.md b/docs/pages/cmip6-downscaling/input-datasets.md
@@ -2,6 +2,41 @@ import Section from '../../components/section'
 
 # Input Datasets
 
-Text here...
+..# Methods
+
+## CMIP6 raw datasets
+
+The downscaled datasets shown here are derived from results from the [Coupled Model Intercomparison Project Phase 6](https://doi.org/10.5194/gmd-9-1937-2016). Raw datasets are also available in the web catalog (labeled “Raw”). GCMs are run at different spatial resolutions, and the data presented here are displayed in their original spatial resolution. The raw CMIP6 datasets were accessed via the Pangeo data catalog.
+
+## Reference dataset
+
+All downscaled datasets here were trained on the [ERA5](https://doi.org/10.1002/qj.3803) global reanalysis product at the 0.25 degree (~25 km) spatial resolution. All downscaling methods used daily temperature maxima and minima and precipitation for the period 1981-2010. One algorithm (GARD-MV) also used wind.
+
+## Reference dataset
+
+### ERA5 Reanalysis
+
+All downscaled datasets here were trained on the [ERA5](https://doi.org/10.1002/qj.3803) global reanalysis product at the 0.25 degree (~25 km) spatial resolution. All downscaling methods used daily temperature maxima and minima and precipitation for the period 1981-2010. One algorithm (GARD-MV) also used wind.
+
+This dataset can be accessed/explored via an [intake](https://intake-esm.readthedocs.io/en/stable/) catalog.
+
+```python
+# !pip install intake-esm
+
+import intake
+cat = intake.open_esm_datastore("https://cmip6downscaling.blob.core.windows.net/training/ERA5-azure.json")
+
+```
+
+The ERA5 data [transfering](https://github.com/carbonplan/cmip6-downscaling/blob/4bf65c61f7192908cca81fe94cda3b94931586f0/flows/ERA5/ERA5_transfer.py) and [processing](https://github.com/carbonplan/cmip6-downscaling/blob/4bf65c61f7192908cca81fe94cda3b94931586f0/flows/ERA5/ERA5_resample.py) scripts can be found on [Github](https://github.com/carbonplan/cmip6-downscaling).
+
+The ECMWF ERA5 dataset was produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) and is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
+https://apps.ecmwf.int/datasets/licences/general/
+
+## CMIP6 raw datasets
+
+The downscaled datasets shown here are derived from results from the [Coupled Model Intercomparison Project Phase 6](https://doi.org/10.5194/gmd-9-1937-2016). Raw datasets are also available in the web catalog (labeled “Raw”). GCMs are run at different spatial resolutions, and the data presented here are displayed in their original spatial resolution. The raw CMIP6 datasets were accessed via the Pangeo data catalog.
+
+The transfer script can be found [here](https://github.com/carbonplan/cmip6-downscaling/blob/main/flows/cmip6_transfer.py)
 
 export default ({ children }) => <Section name='Input Datasets'>{children}</Section>
diff --git a/docs/pages/cmip6-downscaling/intro-to-statistical-downscaling.md b/docs/pages/cmip6-downscaling/intro-to-statistical-downscaling.md
@@ -2,6 +2,4 @@ import Section from '../../components/section'
 
 # Intro to Statistical Downscaling
 
-Text here...
-
 export default ({ children }) => <Section name='Intro to Statistical Downscaling'>{children}</Section>
diff --git a/docs/pages/cmip6-downscaling/running-flows.md b/docs/pages/cmip6-downscaling/running-flows.md
@@ -1,8 +1,6 @@
 import Section from '../../components/section'
 
-# Running Prefect Flows
-
-## Why Prefect
+# Running Flows
 
 In this project each downscaling method [BCSD, GARD, MACA] has it's own workflow for generating results. These data production workflows are handled by the python library, prefect, which encapsulates the data processing steps into individual tasks, which are organized into a 'Flow'.
 
@@ -12,23 +10,21 @@ Prefect allows us to run these downscaling flows with many different parameter c
 
 Prefect has the ability to run flows with different `runtimes`. Choosing the correct runtime can be crucial to help with scaling multiple flows or debugging a single issue.
 
-Pre-configured runtimes are stored in [`runtimes.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/runtimes.py)
+Pre-configured runtimes are stored in [`cmip6_downscaling.runtimes.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/runtimes.py)
 
 The current runtime options are:
 
-[`cloud`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L57) `executor: dask-distrubted` - Runtime for queing multiple flows on prefect cloud.
+[`cloud`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L57) `executor: dask-distrubted` - Runtime for queuing multiple flows on prefect cloud.
 
 [`local`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L113) `executor: local` - Runtime for developing on local machine
 
 [`CI`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L130) `executor: local` - Runtime used for Continuous Integration
 
-[`pangeo`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L140) `executor: dask-distrubted` - Runtime for processesing on jupyter-hub
-
-[`gateway`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L165) `executor: dask-distrubted` - Runtime used for scaling with a dask-gateway cluser
+[`pangeo`](https://github.com/carbonplan/cmip6-downscaling/blob/a0379110c33b557f959a1d6fa53e9f93891a45b3/cmip6_downscaling/runtimes.py#L140) `executor: dask-distrubted` - Runtime for processing on jupyter-hub
 
 ## Modifying Flow Config
 
-Project level configuration settings are in [`config.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/config.py) and configured using the python package [`donfig`](https://donfig.readthedocs.io/en/latest/). Default configuration options can be overwritten in multiple ways with donfig. Below are two options for specifying use of the cloud runtime.
+Project level configuration settings are in [`cmip6_downscaling.config.py`](https://github.com/carbonplan/cmip6-downscaling/blob/main/cmip6_downscaling/config.py) and configured using the python package [`donfig`](https://donfig.readthedocs.io/en/latest/). Default configuration options can be overwritten in multiple ways with donfig. Below are two options for specifying use of the cloud runtime. Note: any `connection_strings` or other sensitive information is best stored in a local .yaml or as an environment variable.
 
 #### Python Context
 
@@ -58,6 +54,43 @@ Config options can also be set with specifically formatted environment variables
 
 [environment variables](https://donfig.readthedocs.io/en/latest/configuration.html#environment-variables)
 
+## Parameter Files
+
+All downscaling flows require run parameters to be passed in as a `.json file`. These parameter files contain arguments to the flows, specifying which downscaling method, which variable etc. Example config files can be found in `cmip6_downscaling.configs.generate_valid_configs.<method>`. Future configs can be generated manually or using the notebook template [generate_valid_json_parameters.ipynb](https://github.com/carbonplan/cmip6-downscaling/blob/main/configs/generate_valid_configs/generate_valid_json_parameters.ipynb).
+
+Example config file:
+
+```json
+{
+  "method": "gard",
+  "obs": "ERA5",
+  "model": "BCC-CSM2-MR",
+  "member": "r1i1p1f1",
+  "grid_label": "gn",
+  "table_id": "day",
+  "scenario": "historical",
+  "features": ["pr"],
+  "variable": "pr",
+  "latmin": "-90",
+  "latmax": "90",
+  "lonmin": "-180",
+  "lonmax": "180",
+  "bias_correction_method": "quantile_mapper",
+  "bias_correction_kwargs": {
+    "pr": { "detrend": false },
+    "tasmin": { "detrend": true },
+    "tasmax": { "detrend": true },
+    "psl": { "detrend": false },
+    "ua": { "detrend": false },
+    "va": { "detrend": false }
+  },
+  "model_type": "PureRegression",
+  "model_params": { "thresh": 0 },
+  "train_dates": ["1981", "2010"],
+  "predict_dates": ["1950", "2014"]
+}
+```
+
 ## Runtimes
 
 ### Cloud
@@ -72,7 +105,7 @@ While this runtime excels at resource scaling and parallel runs, debugging with
 
 #### Registering a Flow
 
-With the prefect cloud runtime selected, flows can be registered and ran with the prefect [CLI](https://docs.prefect.io/orchestration/concepts/cli.html).
+With the prefect cloud runtime selected, flows can be registered and run with the prefect [CLI](https://docs.prefect.io/orchestration/concepts/cli.html).
 
 To register a flow:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,6 +2,4 @@ import Section from '../../components/section'

		# Intro to Statistical Downscaling

		Text here...

		export default ({ children }) => <Section name='Intro to Statistical Downscaling'>{children}</Section>