Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions docs/dev/dace.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
DaCe
============
# DaCe

DaCe is a parallel programming framework developed at SPCL. DaCe is a compiler framework that parses a subset of the Python/NumPy semantics. The intermediate representation that DaCe uses, the SDFG, can be optimizedby passes/transformations.
[DaCe](https://spcldace.readthedocs.io/en/latest/index.htm) is is the full-program optimization framework used in NDSL. DaCe is short for Data-Centric Parallel Programming and developed at ETH's scalable parallel computing lab (SPCL).

SDFGs are a transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, the used optimizations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

For more detailed document about DaCe, please refer to the following link:
https://spcldace.readthedocs.io/en/latest/index.htm
In NDSL, DaCe powers the [performance backends](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) of [GT4Py](./gt4py.md). In particular, in NDSL's orchestration feature we will encode [macro-level optimizations](https://geos-esm.github.io/SMT-Nebulae/technical/backend/ADRs/stree/) like loop re-ordering and stencil fusing using DaCe.
5 changes: 5 additions & 0 deletions docs/dev/gt4py.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# GT4Py

!!! warning

TODO: Add some docs on GT4Py here
24 changes: 12 additions & 12 deletions docs/dev/index.md

Large diffs are not rendered by default.

File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
10 changes: 7 additions & 3 deletions docs/includes/glossary.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
<!-- institutions / groups / teams -->

*[CSCS]: Swiss National Supercomputing Center
*[ETH]: Swiss Federal Institute of Technology
*[GFDL]: Geophysical Fluid Dynamics Laboratory
*[NASA]: National Aeronautics and Space Administration
*[NOAA]: National Oceanic and Atmospheric Administration
*[GFDL]: Geophysical Fluid Dynamics Laboratory
*[SPCL]: Scalable Parallel Computing Laboratory (ETH Zurich)


<!-- technology -->

*[NDSL]: NOAA/NASA Domain Specific Language middleware
*[DSL]: Domain specific language
*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe
*[FORTRAN]: Old programming language
*[IR]: Intermedite Representation: An abstraction between source code and machine code, designed to simplify analysis and optimization during program compilation.
*[NDSL]: NOAA/NASA Domain Specific Language middleware
*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe

<!-- Modeling -->
*[FMS]: Flexible Modeling System - see https://github.com/NOAA-GFDL/FMS
Expand Down
37 changes: 17 additions & 20 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer.


#### Quick Start
## Quick Start

Python `3.11.x` is required for NDSL and all its third party dependencies for installation.

Expand All @@ -19,8 +18,7 @@ NDSL uses pytest for its unit tests, the tests are available via:
- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)


#### Requirements & supported compilers
## Requirements & supported compilers

For CPU backends:

Expand All @@ -38,13 +36,14 @@ For GPU backends (the above plus):
- Libraries:
- MPI compiled with cuda support


#### NDSL installation and testing
## NDSL installation and testing

NDSL is not available at `pypi`, it uses

```bash
pip install NDSL
```

to install NDSL locally.

NDSL has a few options:
Expand All @@ -57,42 +56,40 @@ Tests are available via:
- `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
- `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)


#### Configurations for Pace
## Configurations for Pace

Configurations for Pace to use NDSL with different backend:

- FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior

- Python: default, use stencil only, no full program optmization
- Python: default, use stencil only, no full program optimization

- Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`)

- BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`)

- Run: load pre-compiled program and execute, fail if the .so is not present (_no hashs check!_) (backend must be `dace:gpu` or `dace:cpu`)
- Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`)

- PACE_FLOAT_PRECISION=64 control the floating point precision throughout the program.


Install Pace with different NDSL backend:

- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.

- When cloning Pace you will need to update the repository's submodules as well:
- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
- When cloning Pace you will need to update the repository's submodules as well:

```bash
git clone --recursive https://github.com/ai2cm/pace.git
```

or if you have already cloned the repository:

```bash
git submodule update --init --recursive
```

- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:
- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:

```bash
cd BOOST/ROOT
Expand All @@ -103,17 +100,17 @@ mv boost_1_79_0/boost boost_1_79_0/include/
export BOOST_ROOT=BOOST/ROOT/boost_1_79_0
```

- We recommend creating a python `venv` or conda environment specifically for Pace.
- We recommend creating a python `venv` or conda environment specifically for Pace.

```bash
python3 -m venv venv_name
source venv_name/bin/activate
```

- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:

```bash
pip3 install -r requirements_dev.txt -c constraints.txt
```

- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`).
- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`).
27 changes: 15 additions & 12 deletions docs/porting/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,24 @@

This part of the documentation includes notes about porting FORTRAN code to NDSL.


## General Concepts

Since we are not trying to do model developing but rather replicate an existing model, the main philosophy is to replicate model behavior as precisely as possible.
Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achive.
Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achieve.
There were attempts at solving this problem like shown in [\[2\]][2] or [\[3\]][3] but all of those require heavy modification to the original code.
In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors shwoing up and therefore a 1:1 validation on a large scale is impossible.
In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors showing up and therefore a 1:1 validation on a large scale is impossible.
This effect gets further enhanced by computation on GPUs.
Lastly the mixing of percisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where.
Lastly the mixing of precisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where.

Since large scale validation is therefore close to impossible, we are trying to get repoducible results (within a margin) on smaller subcomponents of the model.
When portring code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually.
Since large scale validation is therefore close to impossible, we are trying to get reproducible results (within a margin) on smaller sub-components of the model.
When porting code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually.
This breakdown serves two main purposes:

1. Give us confidence, that the ported code behaves as intended.
2. Allow us to monitor if or how performance optimization down the road changes the numerical results of our model components.


## Porting Guidelines

Since GT4Py has certain restrictions on what can be in the same stencil and what needs to be in separate stencils, there is no absolute 1:1 mapping that can or should be applied.

The best practices we found are:
Expand All @@ -28,9 +28,11 @@ The best practices we found are:
2. If possible, try to isolate individual numerical motifs into functions.

### Example
To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Rieman Solver):

**Main definition**
To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Riemann Solver):

#### Main definition

```python
class NonhydrostaticVerticalSolverCGrid:
def __init__(self, ...):
Expand All @@ -53,14 +55,16 @@ class NonhydrostaticVerticalSolverCGrid:
self._precompute_stencil(cappa, _pfac)
self._compute_sim1_solve(_pfac, delpc)
```
**Stencil Definitions**

#### Stencil Definitions

```python
#constants definition
c1 = Float(-2.0) / Float(14.0)
c2 = Float(11.0) / Float(14.0)
c3 = Float(5.0) / Float(14.0)

#function for numerical stanadlone motif
#function for numerical standalone motif
@gtscript.function
def vol_conserv_cubic_interp_func_y(v):
return c1 * v[0, -2, 0] + c2 * v[0, -1, 0] + c3 * v
Expand All @@ -78,7 +82,6 @@ def sim1_solver(cappa: FloatField, _pfac: FloatFieldIJ):
cappa = vol_conserv_cubic_interp_func_y(cappa) + _pfac
```


[1]: <https://www.climate.gov/news-features/blogs/enso/butterflies-rounding-errors-and-chaos-climate-models> "Chaos in climate models"
[2]: <https://pasc17.org/fileadmin/user_upload/pasc17/program/post125s2.pdf> "Reproducible Climate Simulations"
[3]: <http://htor.inf.ethz.ch/sec/bitrep-ipdps.pdf> "Bit reproducible HPC applications"
34 changes: 20 additions & 14 deletions docs/porting/translate/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## What are translate tests
# Translate tests

We call tests that validate subsets of computation against serialized data translate tests. These should provide a baseline with wich we can validate ported code and ensure the pipline generates expected results.
We call tests that validate subsets of computation against serialized data "translate tests". These should provide a baseline with which we can validate ported code and ensure the pipeline generates expected results.

## The Translate infrastructure

The infrastructure is set up in a way that for basic cases, all the default implementations are enough:

The `TranslateFortranData2Py` base class will be evaluated through the function `test_sequential_savepoint`.
Expand All @@ -20,35 +21,40 @@ The general structure is:
For these steps to work, the name of the translate test needs to match the name of the data.
In case of special handling required, almost everything can be overwritten:

**Overwriting thresholds:**
### Overwriting thresholds

You can create an overwrite file to manually set the threshold in you data directory:
![image1.png](image1.png)

![image1.png](../../images/translate/image1.png)

### Overwriting Arguments to your compute function

**Overwriting Arguments to your compute function**
The compute_func will be called automatically in the test. If your names in the netcdf are matching the `kwargs` of your function directly, no further action required:

The compute_func will be called automatically in the test. If your names in the netcdf are matching the kwargs of your function directly, no further action required:
![image2.png](image2.png)
![image2.png](../../images/translate/image2.png)

If you need to rename it from the netcdf, you can use ["serialname"]:
![image3.png](image3.png)

![image3.png](../../images/translate/image3.png)

The same applies for scalar inputs with parameters:
![image4.png](image4.png)

![image4.png](../../images/translate/image4.png)

**Modifying output variables**
### Modifying output variables

This can be required either if not all output is serialized, the naming is different or we need the same data as the input:
![image4.png](image4.png)

**Modifying the `compute` function**
Normally, cumpute has the three steps:
![image4.png](../../images/translate/image4.png)

### Modifying the `compute` function

Normally, compute has the three steps:

1. setup input
2. call `compute_func`
3. slice outputs

Slight adaptations to every step are possible:
![image5.png](image5.png)

![image5.png](../../images/translate/image5.png)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ nav:
- Under the hood:
- Technical Documentation: dev/index.md
- DaCe: dev/dace.md
- GT4Py: dev/gt4py.md


markdown_extensions:
Expand Down