diff --git a/docs/dev/dace.md b/docs/dev/dace.md index 13feddcb..6c44cad2 100644 --- a/docs/dev/dace.md +++ b/docs/dev/dace.md @@ -1,9 +1,5 @@ -DaCe -============ +# DaCe -DaCe is a parallel programming framework developed at SPCL. DaCe is a compiler framework that parses a subset of the Python/NumPy semantics. The intermediate representation that DaCe uses, the SDFG, can be optimizedby passes/transformations. +[DaCe](https://spcldace.readthedocs.io/en/latest/index.htm) is is the full-program optimization framework used in NDSL. DaCe is short for Data-Centric Parallel Programming and developed at ETH's scalable parallel computing lab (SPCL). -SDFGs are a transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, the used optimizations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor. - -For more detailed document about DaCe, please refer to the following link: -https://spcldace.readthedocs.io/en/latest/index.htm +In NDSL, DaCe powers the [performance backends](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) of [GT4Py](./gt4py.md). In particular, in NDSL's orchestration feature we will encode [macro-level optimizations](https://geos-esm.github.io/SMT-Nebulae/technical/backend/ADRs/stree/) like loop re-ordering and stencil fusing using DaCe. diff --git a/docs/dev/gt4py.md b/docs/dev/gt4py.md new file mode 100644 index 00000000..b334831f --- /dev/null +++ b/docs/dev/gt4py.md @@ -0,0 +1,5 @@ +# GT4Py + +!!! warning + + TODO: Add some docs on GT4Py here diff --git a/docs/dev/index.md b/docs/dev/index.md index b882c8bc..91b7ca70 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -6,17 +6,17 @@ This is the technical part of the documentation, geared towards developers contr Recently, Python has became the dominant programming language in the machine learning and data sciences communities since it is easy to learn and program. However, the performance of Python is still a major concern in scientific computing and HPC community. In the scientific computing and HPC community, the most widely used programming languages are C/C++ and Fortran, Python is often used as script language for pre- and post-processing. -The major performance issue in Python programming language, especially in computation-intensive applications, are loops, which are often the performance bottlenecks of an application in other programming languages too, such as C++ and Fortran. However, Python programs are often observed to be 10x to 100x slower than C, C++ and Fortran programs. In order to achieve peak hardware performance, the scientific computing communities have tried different programming models, such as OpenMP, Cilk+, and Thread Building Blocks (TBB), as well as Linux p-threads for multi/many-core processors and GPUs, Kokkos, RAJA, OpenMP offload, and OpenACC for highest performance on CPU/GPUs heterogeneous system. All of these programming models are only available for C, C++ and Fortran. Only a few work that target to high perfromance for Python programming language. +The major performance issue in Python programming language, especially in computation-intensive applications, are loops, which are often the performance bottlenecks of an application in other programming languages too, such as C++ and Fortran. However, Python programs are often observed to be 10x to 100x slower than C, C++ and Fortran programs. In order to achieve peak hardware performance, the scientific computing communities have tried different programming models, such as OpenMP, Cilk+, and Thread Building Blocks (TBB), as well as Linux p-threads for multi/many-core processors and GPUs, Kokkos, RAJA, OpenMP offload, and OpenACC for highest performance on CPU/GPUs heterogeneous system. All of these programming models are only available for C, C++ and Fortran. Only a few work that target to high performance for Python programming language. The Python based NDSL programming model described in this developer’s guide provides an alternative solution to reach peak hardware performance with relatively little programming effort by using the stencil semantics. A stencil is similar to parallel for kernels that are used in Kokkos and RAJA, to update array elements according to a fixed access pattern. With the stencil semantics in mind, NDSL, for example, can be used to write matrix multiplication kernels that match the performance of cuBLAS/hipBLAS that many GPU programmers can’t do in Cuda/HiP using only about 30 lines of code. It greatly reduces the programmer’s effort, and NDSL has already been successfully used in the Pace global climate model, which achieves up to 4x speedup, more efficient than the original Fortran implementations. ## Programming model -The programming model of NDSL is composed of backend execution spaces, performance optimization pass and transformations, and memory spaces, memory layout. These abstraction semantics allow the formulation of generic algorithms and data structures which can then be mapped to different types of hardware architectures. Effectively, they allow for compile time transformation of algorithms to allow for adaptions of varying degrees of hardware parallelism as well as of the memory hierarchy. Figure 1 shows the high level architecture of NDSL (without orchestration option), From Fig. 1, it is shown that NDSL uses hierarchy levels intermediate representation (IR) to abstract the structure of computational program, whcih reduces the complexity of application code, and maintenance cost, while the code portability and scalability are increased. This method also avoids raising the information from lower level representations by means of static analysis, and memory leaking, where feasible, and performaing optimizations at the high possible level of abstraction. The methods primarily leverages structural information readily available in the source code, it enables to apply the optimization, such as loop fusion, tiling and vectorization without the need for complicated analysis and heuristics. +The programming model of NDSL is composed of backend execution spaces, performance optimization pass and transformations, and memory spaces, memory layout. These abstraction semantics allow the formulation of generic algorithms and data structures which can then be mapped to different types of hardware architectures. Effectively, they allow for compile time transformation of algorithms to allow for adaptions of varying degrees of hardware parallelism as well as of the memory hierarchy. Figure 1 shows the high level architecture of NDSL (without orchestration option), From Fig. 1, it is shown that NDSL uses hierarchy levels intermediate representation (IR) to abstract the structure of computational program, which reduces the complexity of application code, and maintenance cost, while the code portability and scalability are increased. This method also avoids raising the information from lower level representations by means of static analysis, and memory leaking, where feasible, and performing optimizations at the high possible level of abstraction. The methods primarily leverages structural information readily available in the source code, it enables to apply the optimization, such as loop fusion, tiling and vectorization without the need for complicated analysis and heuristics. -![NDSL flow](./images/ndsl_flow.png) +![NDSL flow](../images/dev/ndsl_flow.png) -In NDSL, the python frontend code takes the user defined stencils to python AST using builtin ast module. In an AST, each node is an object defined in python AST grammar class (for more details, please refer: https://docs.python.org/3/library/ast.html). the AST node visitor (the NDSL/external/gt4py/src/gt4py/cartesian/frontend/gtscript_frontend.py) IRMaker class traverses the AST of a python function decorated by @gtscript.function and/or stencil objects, the Python AST of the program is then lowing to the Definition IR. The definition IR is high level IR, and is composed of high level program, domain-specific information, and the structure of computational operations which are independent of low level hardware platform. The definition of high level IR allows transformation of the IRs without lossing the performance of numerical libraries. However, the high level IR doesn’t contains detailed information that required for performance on specific low level runtime hardware. Specificially, the definition IR only preserves the necessary information to lower operations to runtime platform hardware instructions implementing coarse-grained vector operations, or to numerical libraries — such as cuBLAS/hipBLAS and Intel MKL. +In NDSL, the python frontend code takes the user defined stencils to python AST using builtin ast module. In an AST, each node is an object defined in python AST grammar class (for more details, please refer: https://docs.python.org/3/library/ast.html). the AST node visitor (the NDSL/external/gt4py/src/gt4py/cartesian/frontend/gtscript_frontend.py) IRMaker class traverses the AST of a python function decorated by @gtscript.function and/or stencil objects, the Python AST of the program is then lowing to the Definition IR. The definition IR is high level IR, and is composed of high level program, domain-specific information, and the structure of computational operations which are independent of low level hardware platform. The definition of high level IR allows transformation of the IRs without loosing the performance of numerical libraries. However, the high level IR doesn’t contains detailed information that required for performance on specific low level runtime hardware. Specifically, the definition IR only preserves the necessary information to lower operations to runtime platform hardware instructions implementing coarse-grained vector operations, or to numerical libraries — such as cuBLAS/hipBLAS and Intel MKL. The definition IR is then transformed to GTIR (gt4py/src/gt4py/cartesian/frontend/defir_to_gtir.py), the GTIR stencils is defined as in NDSL @@ -37,7 +37,7 @@ class Stencil(LocNode, eve.ValidatedSymbolTableTrait): _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl) ``` -GTIR is also a high level IR, it contains vertical_loops loop statement, in the climate applications, the vertical loops usually need special treatment as the numerical unstability is arison. The vertical_loops in GTIR as separate code block and help the following performance pass and transofrmation implementation. The program analysis pass/transformation is applied on the GTIR to remove the redunant nodes, and prunning the unused parameters, and data type and shape propogations of the symbols, and loop extensions. +GTIR is also a high level IR, it contains vertical_loops loop statement, in the climate applications, the vertical loops usually need special treatment as the numerical unstanbility is a reason. The vertical_loops in GTIR as separate code block and help the following performance pass and transformation implementation. The program analysis pass/transformation is applied on the GTIR to remove the redundant nodes, and pruning the unused parameters, and data type and shape propagations of the symbols, and loop extensions. The GTIR is then further lowered to optimization IR (OIR), which is defined as @@ -53,13 +53,13 @@ class Stencil(LocNode, eve.ValidatedSymbolTableTrait): _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl) ``` -The OIR is particularly designed for performance optimization, the performation optimization algorithm are carried out on OIR by developing pass/transorformations. Currently, the vertical loop merging, and horizonal execution loop merging, and loop unrolling and vectorization, statement fusion and pruning optimizations are available and activated by the environmental variable in the oir_pipeline module. +The OIR is particularly designed for performance optimization, the performance optimization algorithm are carried out on OIR by developing pass/transformations. Currently, the vertical loop merging, and horizontal execution loop merging, and loop unrolling and vectorization, statement fusion and pruning optimizations are available and activated by the environmental variable in the oir_pipeline module. -After the optimization pipeline finished, the OIR is then converted to different backend IR, for example, DACE IR (SDFG). The DACE SDFG can be further optimizated by its embeded pass/transormations algorithm, but in PACE application, we didn’t activate this optimization step. It should be pointed out that, during the OIR to SDFG process, the horizontal execution node is serialized to SDFG library node, within which the loop expansion information is encrypted. +After the optimization pipeline finished, the OIR is then converted to different backend IR, for example, DACE IR (SDFG). The DACE SDFG can be further optimized by its embedded pass/transformations algorithm, but in PACE application, we didn’t activate this optimization step. It should be pointed out that, during the OIR to SDFG process, the horizontal execution node is serialized to SDFG library node, within which the loop expansion information is encrypted. -When using GT backend, the OIR is then directly used by the gt4py code generator to generate the C++ gridtool stencils (computation code), and the python binding code. In this backend, each horizontal execution node will be passed to and generate a seperate gridtool stencil. +When using GT backend, the OIR is then directly used by the gt4py code generator to generate the C++ GridTools stencils (computation code), and the python binding code. In this backend, each horizontal execution node will be passed to and generate a separate GridTools stencil. -NDSL also supports the whole program optimization model, this is called orchestration model in NDSL, currently it only supports DaCe backend. Whole program optimziation with DaCe is the process of turning all Python and GT4Py code in generated C++. Only _orchestrate_ the runtime code of the model is applied, e.g. everything in the __call__ method of the module and all code in __init__ is executed like a normal GT backend. +NDSL also supports the whole program optimization model, this is called orchestration model in NDSL, currently it only supports DaCe backend. Whole program optimization with DaCe is the process of turning all Python and GT4Py code in generated C++. Only _orchestrate_ the runtime code of the model is applied, e.g. everything in the __call__ method of the module and all code in __init__ is executed like a normal GT backend. At the highest level in Pace, to turn on orchestration you need to flip the FV3_DACEMODE to an orchestrated options _and_ run a dace:* backend (it will error out if run anything else). Option for FV3_DACEMODE are: @@ -81,15 +81,15 @@ DaCe needs to be described all memory so it can interface it in the C code that Figure 2 shows the hierarchy levels of intermediate representations (IR) and the lowing process when orchestration option is activated. -![NDSL orchestration](images/ndsl_orchestration.png) +![NDSL orchestration](../images/dev/ndsl_orchestration.png) -When the orchestrated option is turned on, the call method object is patched in place, replacing the orignal Callable with a wrapper that will trigger orchestration at call time. If the model configuration doesn’t demand orchestration, this won’t do anything. The orchestrated call methods and the computational stencils (lazy computational stencils) which are cached in a container, will be parsed to python AST by the frontend code during the runtime, then the python AST code will be converted to DaCe SDFG. The analysis and optimization will be applied before the C++ code is generated by the codegen, this process is called Just In Time (JIT) build, compared with the non-orchestration model, which is eagerly compiled and build. The JIT build caches the build information of computational stencils, and orchestrated methods, and it is more convenient to apply the analysis and optimization pass to the overall code, such as the merging of neighbor stencils made easy. Therefore, more optimized code can be generated, and better performance can be achieved during runtime. +When the orchestrated option is turned on, the call method object is patched in place, replacing the original Callable with a wrapper that will trigger orchestration at call time. If the model configuration doesn’t demand orchestration, this won’t do anything. The orchestrated call methods and the computational stencils (lazy computational stencils) which are cached in a container, will be parsed to python AST by the frontend code during the runtime, then the python AST code will be converted to DaCe SDFG. The analysis and optimization will be applied before the C++ code is generated by the codegen, this process is called Just In Time (JIT) build, compared with the non-orchestration model, which is eagerly compiled and build. The JIT build caches the build information of computational stencils, and orchestrated methods, and it is more convenient to apply the analysis and optimization pass to the overall code, such as the merging of neighbor stencils made easy. Therefore, more optimized code can be generated, and better performance can be achieved during runtime. ## Analysis and Optimization One of the major features of NDSL is that users can develop a new pass/transformation for the backend with new hardware, the passes and/or transformations are the key integrates in order to have good performance on the new hardware. In different abstract level, the passes and/or transformations perform different levels of optimization. For example, the loop level of optimization is independent of hardware, and can be applied to any backend, while the optimization of device placement, and memory and caches optimizations are dependent on different backend and hardware. In this section, we only focused on the optimizations that are independent of the backend hardware. -The general procedure of code optimization has two steps, in the first step, a filter function is called to find the pattern that need to apply the pass and/or transformation, then apply the pass and/or transoformation to the filtered pattern to insert or delte or replace the existing node with the optimizated node. In NDSL, the following passes and/transorformations are provided. +The general procedure of code optimization has two steps, in the first step, a filter function is called to find the pattern that need to apply the pass and/or transformation, then apply the pass and/or transformation to the filtered pattern to insert or delete or replace the existing node with the optimized node. In NDSL, the following passes and/transformations are provided. ```python def prune_unused_parameters(node: gtir.Stencil) -> gtir.Stencil: diff --git a/docs/dev/images/ndsl_flow.png b/docs/images/dev/ndsl_flow.png similarity index 100% rename from docs/dev/images/ndsl_flow.png rename to docs/images/dev/ndsl_flow.png diff --git a/docs/dev/images/ndsl_orchestration.png b/docs/images/dev/ndsl_orchestration.png similarity index 100% rename from docs/dev/images/ndsl_orchestration.png rename to docs/images/dev/ndsl_orchestration.png diff --git a/docs/porting/translate/image1.png b/docs/images/translate/image1.png similarity index 100% rename from docs/porting/translate/image1.png rename to docs/images/translate/image1.png diff --git a/docs/porting/translate/image2.png b/docs/images/translate/image2.png similarity index 100% rename from docs/porting/translate/image2.png rename to docs/images/translate/image2.png diff --git a/docs/porting/translate/image3.png b/docs/images/translate/image3.png similarity index 100% rename from docs/porting/translate/image3.png rename to docs/images/translate/image3.png diff --git a/docs/porting/translate/image4.png b/docs/images/translate/image4.png similarity index 100% rename from docs/porting/translate/image4.png rename to docs/images/translate/image4.png diff --git a/docs/porting/translate/image5.png b/docs/images/translate/image5.png similarity index 100% rename from docs/porting/translate/image5.png rename to docs/images/translate/image5.png diff --git a/docs/includes/glossary.md b/docs/includes/glossary.md index d4eb040c..c40936cc 100644 --- a/docs/includes/glossary.md +++ b/docs/includes/glossary.md @@ -1,16 +1,20 @@ +*[CSCS]: Swiss National Supercomputing Center +*[ETH]: Swiss Federal Institute of Technology +*[GFDL]: Geophysical Fluid Dynamics Laboratory *[NASA]: National Aeronautics and Space Administration *[NOAA]: National Oceanic and Atmospheric Administration -*[GFDL]: Geophysical Fluid Dynamics Laboratory *[SPCL]: Scalable Parallel Computing Laboratory (ETH Zurich) + -*[NDSL]: NOAA/NASA Domain Specific Language middleware *[DSL]: Domain specific language -*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe +*[FORTRAN]: Old programming language *[IR]: Intermedite Representation: An abstraction between source code and machine code, designed to simplify analysis and optimization during program compilation. +*[NDSL]: NOAA/NASA Domain Specific Language middleware +*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe *[FMS]: Flexible Modeling System - see https://github.com/NOAA-GFDL/FMS diff --git a/docs/index.md b/docs/index.md index 5c9bff38..24053f27 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,8 +2,7 @@ NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer. - -#### Quick Start +## Quick Start Python `3.11.x` is required for NDSL and all its third party dependencies for installation. @@ -19,8 +18,7 @@ NDSL uses pytest for its unit tests, the tests are available via: - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed) - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed) - -#### Requirements & supported compilers +## Requirements & supported compilers For CPU backends: @@ -38,13 +36,14 @@ For GPU backends (the above plus): - Libraries: - MPI compiled with cuda support - -#### NDSL installation and testing +## NDSL installation and testing NDSL is not available at `pypi`, it uses + ```bash pip install NDSL ``` + to install NDSL locally. NDSL has a few options: @@ -57,42 +56,40 @@ Tests are available via: - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed) - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed) - -#### Configurations for Pace +## Configurations for Pace Configurations for Pace to use NDSL with different backend: - FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior - - Python: default, use stencil only, no full program optmization + - Python: default, use stencil only, no full program optimization - Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`) - BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`) - - Run: load pre-compiled program and execute, fail if the .so is not present (_no hashs check!_) (backend must be `dace:gpu` or `dace:cpu`) + - Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`) - PACE_FLOAT_PRECISION=64 control the floating point precision throughout the program. - Install Pace with different NDSL backend: - - Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`. - - - When cloning Pace you will need to update the repository's submodules as well: +- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`. +- When cloning Pace you will need to update the repository's submodules as well: ```bash git clone --recursive https://github.com/ai2cm/pace.git ``` + or if you have already cloned the repository: ```bash git submodule update --init --recursive ``` - - Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend. - You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed). - If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set: +- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend. +You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed). +If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set: ```bash cd BOOST/ROOT @@ -103,17 +100,17 @@ mv boost_1_79_0/boost boost_1_79_0/include/ export BOOST_ROOT=BOOST/ROOT/boost_1_79_0 ``` - - We recommend creating a python `venv` or conda environment specifically for Pace. +- We recommend creating a python `venv` or conda environment specifically for Pace. ```bash python3 -m venv venv_name source venv_name/bin/activate ``` - - Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace: +- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace: ```bash pip3 install -r requirements_dev.txt -c constraints.txt ``` - - There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`). +- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation (`requirements_docs.txt`). diff --git a/docs/porting/index.md b/docs/porting/index.md index d8ac002a..51459663 100644 --- a/docs/porting/index.md +++ b/docs/porting/index.md @@ -2,24 +2,24 @@ This part of the documentation includes notes about porting FORTRAN code to NDSL. - ## General Concepts + Since we are not trying to do model developing but rather replicate an existing model, the main philosophy is to replicate model behavior as precisely as possible. -Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achive. +Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achieve. There were attempts at solving this problem like shown in [\[2\]][2] or [\[3\]][3] but all of those require heavy modification to the original code. -In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors shwoing up and therefore a 1:1 validation on a large scale is impossible. +In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors showing up and therefore a 1:1 validation on a large scale is impossible. This effect gets further enhanced by computation on GPUs. -Lastly the mixing of percisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where. +Lastly the mixing of precisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where. -Since large scale validation is therefore close to impossible, we are trying to get repoducible results (within a margin) on smaller subcomponents of the model. -When portring code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually. +Since large scale validation is therefore close to impossible, we are trying to get reproducible results (within a margin) on smaller sub-components of the model. +When porting code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually. This breakdown serves two main purposes: 1. Give us confidence, that the ported code behaves as intended. 2. Allow us to monitor if or how performance optimization down the road changes the numerical results of our model components. - ## Porting Guidelines + Since GT4Py has certain restrictions on what can be in the same stencil and what needs to be in separate stencils, there is no absolute 1:1 mapping that can or should be applied. The best practices we found are: @@ -28,9 +28,11 @@ The best practices we found are: 2. If possible, try to isolate individual numerical motifs into functions. ### Example -To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Rieman Solver): -**Main definition** +To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Riemann Solver): + +#### Main definition + ```python class NonhydrostaticVerticalSolverCGrid: def __init__(self, ...): @@ -53,14 +55,16 @@ class NonhydrostaticVerticalSolverCGrid: self._precompute_stencil(cappa, _pfac) self._compute_sim1_solve(_pfac, delpc) ``` -**Stencil Definitions** + +#### Stencil Definitions + ```python #constants definition c1 = Float(-2.0) / Float(14.0) c2 = Float(11.0) / Float(14.0) c3 = Float(5.0) / Float(14.0) -#function for numerical stanadlone motif +#function for numerical standalone motif @gtscript.function def vol_conserv_cubic_interp_func_y(v): return c1 * v[0, -2, 0] + c2 * v[0, -1, 0] + c3 * v @@ -78,7 +82,6 @@ def sim1_solver(cappa: FloatField, _pfac: FloatFieldIJ): cappa = vol_conserv_cubic_interp_func_y(cappa) + _pfac ``` - [1]: "Chaos in climate models" [2]: "Reproducible Climate Simulations" [3]: "Bit reproducible HPC applications" diff --git a/docs/porting/translate/index.md b/docs/porting/translate/index.md index 0fb199cd..1aa083bd 100644 --- a/docs/porting/translate/index.md +++ b/docs/porting/translate/index.md @@ -1,8 +1,9 @@ -## What are translate tests +# Translate tests -We call tests that validate subsets of computation against serialized data translate tests. These should provide a baseline with wich we can validate ported code and ensure the pipline generates expected results. +We call tests that validate subsets of computation against serialized data "translate tests". These should provide a baseline with which we can validate ported code and ensure the pipeline generates expected results. ## The Translate infrastructure + The infrastructure is set up in a way that for basic cases, all the default implementations are enough: The `TranslateFortranData2Py` base class will be evaluated through the function `test_sequential_savepoint`. @@ -20,35 +21,40 @@ The general structure is: For these steps to work, the name of the translate test needs to match the name of the data. In case of special handling required, almost everything can be overwritten: -**Overwriting thresholds:** +### Overwriting thresholds You can create an overwrite file to manually set the threshold in you data directory: -![image1.png](image1.png) +![image1.png](../../images/translate/image1.png) + +### Overwriting Arguments to your compute function -**Overwriting Arguments to your compute function** +The compute_func will be called automatically in the test. If your names in the netcdf are matching the `kwargs` of your function directly, no further action required: -The compute_func will be called automatically in the test. If your names in the netcdf are matching the kwargs of your function directly, no further action required: -![image2.png](image2.png) +![image2.png](../../images/translate/image2.png) If you need to rename it from the netcdf, you can use ["serialname"]: -![image3.png](image3.png) + +![image3.png](../../images/translate/image3.png) The same applies for scalar inputs with parameters: -![image4.png](image4.png) +![image4.png](../../images/translate/image4.png) -**Modifying output variables** +### Modifying output variables This can be required either if not all output is serialized, the naming is different or we need the same data as the input: -![image4.png](image4.png) -**Modifying the `compute` function** -Normally, cumpute has the three steps: +![image4.png](../../images/translate/image4.png) + +### Modifying the `compute` function + +Normally, compute has the three steps: 1. setup input 2. call `compute_func` 3. slice outputs Slight adaptations to every step are possible: -![image5.png](image5.png) + +![image5.png](../../images/translate/image5.png) diff --git a/mkdocs.yml b/mkdocs.yml index c47123a1..09916f21 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -16,6 +16,7 @@ nav: - Under the hood: - Technical Documentation: dev/index.md - DaCe: dev/dace.md + - GT4Py: dev/gt4py.md markdown_extensions: