twicki · twicki · Mar 28, 2025 · Mar 28, 2025 · Mar 28, 2025
diff --git a/docs/dev/dace.md b/docs/dev/dace.md
@@ -1,9 +1,5 @@
-DaCe
-============
+# DaCe
 
-DaCe is a parallel programming framework developed at SPCL. DaCe is a compiler framework that parses a subset of the Python/NumPy semantics. The intermediate representation that DaCe uses, the SDFG, can be optimizedby passes/transformations.
+[DaCe](https://spcldace.readthedocs.io/en/latest/index.htm) is is the full-program optimization framework used in NDSL. DaCe is short for Data-Centric Parallel Programming and developed at ETH's scalable parallel computing lab (SPCL).
 
-SDFGs are a transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, the used optimizations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.
-
-For more detailed document about DaCe, please refer to the following link:
-https://spcldace.readthedocs.io/en/latest/index.htm
+In NDSL, DaCe powers the [performance backends](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) of [GT4Py](./gt4py.md). In particular, in NDSL's orchestration feature we will encode [macro-level optimizations](https://geos-esm.github.io/SMT-Nebulae/technical/backend/ADRs/stree/) like loop re-ordering and stencil fusing using DaCe.
diff --git a/docs/dev/gt4py.md b/docs/dev/gt4py.md
@@ -0,0 +1,5 @@
+# GT4Py
+
+!!! warning
+
+    TODO: Add some docs on GT4Py here
diff --git a/docs/dev/index.md b/docs/dev/index.md
diff --git a/docs/dev/images/ndsl_flow.png → docs/images/dev/ndsl_flow.png b/docs/dev/images/ndsl_flow.png → docs/images/dev/ndsl_flow.png
diff --git a/docs/dev/images/ndsl_orchestration.png → docs/images/dev/ndsl_orchestration.png b/docs/dev/images/ndsl_orchestration.png → docs/images/dev/ndsl_orchestration.png
diff --git a/docs/porting/translate/image1.png → docs/images/translate/image1.png b/docs/porting/translate/image1.png → docs/images/translate/image1.png
diff --git a/docs/porting/translate/image2.png → docs/images/translate/image2.png b/docs/porting/translate/image2.png → docs/images/translate/image2.png
diff --git a/docs/porting/translate/image3.png → docs/images/translate/image3.png b/docs/porting/translate/image3.png → docs/images/translate/image3.png
diff --git a/docs/porting/translate/image4.png → docs/images/translate/image4.png b/docs/porting/translate/image4.png → docs/images/translate/image4.png
diff --git a/docs/porting/translate/image5.png → docs/images/translate/image5.png b/docs/porting/translate/image5.png → docs/images/translate/image5.png
diff --git a/docs/includes/glossary.md b/docs/includes/glossary.md
@@ -1,16 +1,20 @@
 <!-- institutions / groups / teams -->
 
+*[CSCS]: Swiss National Supercomputing Center
+*[ETH]: Swiss Federal Institute of Technology
+*[GFDL]: Geophysical Fluid Dynamics Laboratory
 *[NASA]: National Aeronautics and Space Administration
 *[NOAA]: National Oceanic and Atmospheric Administration
-*[GFDL]: Geophysical Fluid Dynamics Laboratory
 *[SPCL]: Scalable Parallel Computing Laboratory (ETH Zurich)
 
+
 <!-- technology -->
 
-*[NDSL]: NOAA/NASA Domain Specific Language middleware
 *[DSL]: Domain specific language
-*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe
+*[FORTRAN]: Old programming language
 *[IR]: Intermedite Representation: An abstraction between source code and machine code, designed to simplify analysis and optimization during program compilation.
+*[NDSL]: NOAA/NASA Domain Specific Language middleware
+*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe
 
 <!-- Modeling -->
 *[FMS]: Flexible Modeling System - see https://github.com/NOAA-GFDL/FMS

diff --git a/docs/index.md b/docs/index.md
@@ -2,8 +2,7 @@
 
 NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer.
 
-
-#### Quick Start
+## Quick Start
 
 Python `3.11.x` is required for NDSL and all its third party dependencies for installation.
 
@@ -19,8 +18,7 @@ NDSL uses pytest for its unit tests, the tests are available via:
 - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
 - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
 
-
-#### Requirements & supported compilers
+## Requirements & supported compilers
 
 For CPU backends:
 
@@ -38,13 +36,14 @@ For GPU backends (the above plus):
 - Libraries:
   - MPI compiled with cuda support
 
-
-####  NDSL installation and testing
+## NDSL installation and testing
 
 NDSL is not available at `pypi`, it uses
+
 ```bash
 pip install NDSL
 ```
+
 to install NDSL locally.
 
 NDSL has a few options:
@@ -57,42 +56,40 @@ Tests are available via:
 - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
 - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
 
-
-####  Configurations for Pace
+## Configurations for Pace
 
 Configurations for Pace to use NDSL with different backend:
 
 - FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior
 
-  - Python: default, use stencil only, no full program optmization
+  - Python: default, use stencil only, no full program optimization
 
   - Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`)
 
   - BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`)
 
-  - Run: load pre-compiled program and execute, fail if the .so is not present (_no hashs check!_) (backend must be `dace:gpu` or `dace:cpu`)
+  - Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`)
 
 - PACE_FLOAT_PRECISION=64 control the floating point precision throughout the program.
 
-
 Install Pace with different NDSL backend:
 
-  - Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
-
-  - When cloning Pace you will need to update the repository's submodules as well:
+- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
+- When cloning Pace you will need to update the repository's submodules as well:
 
 ```bash
 git clone --recursive https://github.com/ai2cm/pace.git
 ```
+
   or if you have already cloned the repository:
 
 ```bash
 git submodule update --init --recursive
 ```
 
-  - Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
-  You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
-  If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:
+- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
+You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
+If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:
 
 ```bash
 cd BOOST/ROOT
@@ -103,17 +100,17 @@ mv boost_1_79_0/boost boost_1_79_0/include/
 export BOOST_ROOT=BOOST/ROOT/boost_1_79_0
 ```
 
-  - We recommend creating a python `venv` or conda environment specifically for Pace.
+- We recommend creating a python `venv` or conda environment specifically for Pace.
 
 ```bash
 python3 -m venv venv_name
 source venv_name/bin/activate
 ```
 
-  - Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
+- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
 
 ```bash
 pip3 install -r requirements_dev.txt -c constraints.txt
 ```
 
-  - There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation   (`requirements_docs.txt`).
+- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation   (`requirements_docs.txt`).
diff --git a/docs/porting/index.md b/docs/porting/index.md
@@ -2,24 +2,24 @@
 
 This part of the documentation includes notes about porting FORTRAN code to NDSL.
 
-
 ## General Concepts
+
 Since we are not trying to do model developing but rather replicate an existing model, the main philosophy is to replicate model behavior as precisely as possible.
-Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achive.
+Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achieve.
 There were attempts at solving this problem like shown in [\[2\]][2] or [\[3\]][3] but all of those require heavy modification to the original code.
-In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors shwoing up and therefore a 1:1 validation on a large scale is impossible.
+In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors showing up and therefore a 1:1 validation on a large scale is impossible.
 This effect gets further enhanced by computation on GPUs.
-Lastly the mixing of percisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where.
+Lastly the mixing of precisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where.
 
-Since large scale validation is therefore close to impossible, we are trying to get repoducible results (within a margin) on smaller subcomponents of the model.
-When portring code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually.
+Since large scale validation is therefore close to impossible, we are trying to get reproducible results (within a margin) on smaller sub-components of the model.
+When porting code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually.
 This breakdown serves two main purposes:
 
 1. Give us confidence, that the ported code behaves as intended.
 2. Allow us to monitor if or how performance optimization down the road changes the numerical results of our model components.
 
-
 ## Porting Guidelines
+
 Since GT4Py has certain restrictions on what can be in the same stencil and what needs to be in separate stencils, there is no absolute 1:1 mapping that can or should be applied.
 
 The best practices we found are:
@@ -28,9 +28,11 @@ The best practices we found are:
 2. If possible, try to isolate individual numerical motifs into functions.
 
 ### Example
-To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Rieman Solver):
 
-**Main definition**
+To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Riemann Solver):
+
+#### Main definition
+
 ```python
 class NonhydrostaticVerticalSolverCGrid:
     def __init__(self, ...):
@@ -53,14 +55,16 @@ class NonhydrostaticVerticalSolverCGrid:
         self._precompute_stencil(cappa, _pfac)
         self._compute_sim1_solve(_pfac, delpc)
 ```
-**Stencil Definitions**
+
+#### Stencil Definitions
+
 ```python
 #constants definition
 c1 = Float(-2.0) / Float(14.0)
 c2 = Float(11.0) / Float(14.0)
 c3 = Float(5.0) / Float(14.0)
 
-#function for numerical stanadlone motif
+#function for numerical standalone motif
 @gtscript.function
 def vol_conserv_cubic_interp_func_y(v):
     return c1 * v[0, -2, 0] + c2 * v[0, -1, 0] + c3 * v
@@ -78,7 +82,6 @@ def sim1_solver(cappa: FloatField, _pfac: FloatFieldIJ):
         cappa = vol_conserv_cubic_interp_func_y(cappa) + _pfac
 ```
 
-
 [1]: <https://www.climate.gov/news-features/blogs/enso/butterflies-rounding-errors-and-chaos-climate-models> "Chaos in climate models"
 [2]: <https://pasc17.org/fileadmin/user_upload/pasc17/program/post125s2.pdf> "Reproducible Climate Simulations"
 [3]: <http://htor.inf.ethz.ch/sec/bitrep-ipdps.pdf> "Bit reproducible HPC applications"
diff --git a/docs/porting/translate/index.md b/docs/porting/translate/index.md
@@ -1,8 +1,9 @@
-## What are translate tests
+# Translate tests
 
-We call tests that validate subsets of computation against serialized data translate tests. These should provide a baseline with wich we can validate ported code and ensure the pipline generates expected results.
+We call tests that validate subsets of computation against serialized data "translate tests". These should provide a baseline with which we can validate ported code and ensure the pipeline generates expected results.
 
 ## The Translate infrastructure
+
 The infrastructure is set up in a way that for basic cases, all the default implementations are enough:
 
 The `TranslateFortranData2Py` base class will be evaluated through the function `test_sequential_savepoint`.
@@ -20,35 +21,40 @@ The general structure is:
 For these steps to work, the name of the translate test needs to match the name of the data.
 In case of special handling required, almost everything can be overwritten:
 
-**Overwriting thresholds:**
+### Overwriting thresholds
 
 You can create an overwrite file to manually set the threshold in you data directory:
-![image1.png](image1.png)
 
+![image1.png](../../images/translate/image1.png)
+
+### Overwriting Arguments to your compute function
 
-**Overwriting Arguments to your compute function**
+The compute_func will be called automatically in the test. If your names in the netcdf are matching the `kwargs` of your function directly, no further action required:
 
-The compute_func will be called automatically in the test. If your names in the netcdf are matching the kwargs of your function directly, no further action required:
-![image2.png](image2.png)
+![image2.png](../../images/translate/image2.png)
 
 If you need to rename it from the netcdf, you can use ["serialname"]:
-![image3.png](image3.png)
+
+![image3.png](../../images/translate/image3.png)
 
 The same applies for scalar inputs with parameters:
-![image4.png](image4.png)
 
+![image4.png](../../images/translate/image4.png)
 
-**Modifying output variables**
+### Modifying output variables
 
 This can be required either if not all output is serialized, the naming is different or we need the same data as the input:
-![image4.png](image4.png)
 
-**Modifying the `compute` function**
-Normally, cumpute has the three steps:
+![image4.png](../../images/translate/image4.png)
+
+### Modifying the `compute` function
+
+Normally, compute has the three steps:
 
 1. setup input
 2. call `compute_func`
 3. slice outputs
 
 Slight adaptations to every step are possible:
-![image5.png](image5.png)
+
+![image5.png](../../images/translate/image5.png)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -16,6 +16,7 @@ nav:
   - Under the hood:
     - Technical Documentation: dev/index.md
     - DaCe: dev/dace.md
+    - GT4Py: dev/gt4py.md
 
 
 markdown_extensions: