diff --git a/README.md b/README.md
index 3554660c..3ea17add 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,9 @@ NDSL submodules `gt4py` and `dace` to point to vetted versions, use `git clone -
 NDSL is __NOT__ available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages has a few options:
 
 - `ndsl[test]`: installs the test packages (based on `pytest`)
-- `ndsl[develop]`: installs tools for development and tests.
+- `ndsl[demos]`: installs extra requirements to run [NDSL exmpales](./examples/NDSL/)
+- `ndsl[docs]`: installs extra requirements to build the docs
+- `ndsl[develop]`: installs tools for development, docs, and tests.
 
 Tests are available via:
 
@@ -45,11 +47,28 @@ For GPU backends (the above plus):
 
 ## Development
 
-TBD: Code/contribution guideline
+### Code/contribution guidelines
 
-TBD: Documentation
+TBD
 
-Point of Contacts:
+### Documentation
+
+We are using [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/), which allows us to write the docs in Markdown files and optionally serve it as a static site.
+
+To view the documentation, install NDSL with the `docs` or `develop` extras. Then just run
+
+```bash
+mkdocs serve
+```
+
+Contributing to the documentation is straight forward:
+
+1. Add and/or change files in the [docs/](./docs/) folder as necessary.
+2. [Optional] If you have changes to the navigation, modify [mkdocs.yml](mkdocs.yml).
+3. [Optional] Start the development server and look how your changes are rendered.
+4. Submit a pull request with your changes.
+
+## Point of Contacts
 
 - NOAA: Rusty Benson: rusty.benson -at- noaa.gov
 - NASA: Florian Deconinck florian.g.deconinck -at- nasa.gov
diff --git a/docs/builddocs.sh b/docs/builddocs.sh
deleted file mode 100644
index 3abe626a..00000000
--- a/docs/builddocs.sh
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/bin/bash
-
-# exit immediately on error
-set -e
-
-# To avoid issues when calling the script from different directories
-# sets the directory to the location of the script
-cd $(dirname $0)
-
-# This short script builds both the doxygen and sphinx documentation
-
-# Define pretty colors
-YEL='\033[0;33m'
-GRN='\033[1;32m'
-NC='\033[0m'
-
-# Build sphinx documents
-cd sphinx_doc/
-make clean          # fixes occasional unexpected behavior
-make html
-
-echo ""
-echo -e "-- ${GRN}Building Docs Complete${NC} --"
-echo -e "See ${YEL}sphinx_doc/_build/html/${NC} for sphinx html files"
diff --git a/docs/dev/dace.md b/docs/dev/dace.md
new file mode 100644
index 00000000..6c44cad2
--- /dev/null
+++ b/docs/dev/dace.md
@@ -0,0 +1,5 @@
+# DaCe
+
+[DaCe](https://spcldace.readthedocs.io/en/latest/index.htm) is is the full-program optimization framework used in NDSL. DaCe is short for Data-Centric Parallel Programming and developed at ETH's scalable parallel computing lab (SPCL).
+
+In NDSL, DaCe powers the [performance backends](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) of [GT4Py](./gt4py.md). In particular, in NDSL's orchestration feature we will encode [macro-level optimizations](https://geos-esm.github.io/SMT-Nebulae/technical/backend/ADRs/stree/) like loop re-ordering and stencil fusing using DaCe.
diff --git a/docs/dev/gt4py.md b/docs/dev/gt4py.md
new file mode 100644
index 00000000..b334831f
--- /dev/null
+++ b/docs/dev/gt4py.md
@@ -0,0 +1,5 @@
+# GT4Py
+
+!!! warning
+
+    TODO: Add some docs on GT4Py here
diff --git a/docs/dev/index.md b/docs/dev/index.md
new file mode 100644
index 00000000..91b7ca70
--- /dev/null
+++ b/docs/dev/index.md
@@ -0,0 +1,107 @@
+# Under the hood
+
+This is the technical part of the documentation, geared towards developers contributing to NDSL.
+
+## Introduction
+
+Recently, Python has became the dominant programming language in the machine learning and data sciences communities since it is easy to learn and program. However, the performance of Python is still a major concern in scientific computing and HPC community. In the scientific computing and HPC community, the most widely used programming languages are C/C++ and Fortran, Python is often used as script language for pre- and post-processing.
+
+The major performance issue in Python programming language, especially in computation-intensive applications, are loops, which are often the performance bottlenecks of an application in other programming languages too, such as C++ and Fortran. However, Python programs are often observed to be 10x to 100x slower than C, C++ and Fortran programs. In order to achieve peak hardware performance, the scientific computing communities have tried different programming models, such as OpenMP, Cilk+, and Thread Building Blocks (TBB), as well as Linux p-threads for multi/many-core processors and GPUs, Kokkos, RAJA, OpenMP offload, and OpenACC for highest performance on CPU/GPUs heterogeneous system. All of these programming models are only available for C, C++ and Fortran. Only a few work that target to high performance for Python programming language.
+
+The Python based NDSL programming model described in this developer’s guide provides an alternative solution to reach peak hardware performance with relatively little programming effort by using the stencil semantics. A stencil is similar to parallel for kernels that are used in Kokkos and RAJA, to update array elements according to a fixed access pattern. With the stencil semantics in mind, NDSL, for example, can be used to write matrix multiplication kernels that match the performance of cuBLAS/hipBLAS that many GPU programmers can’t do in Cuda/HiP using only about 30 lines of code. It greatly reduces the programmer’s effort, and NDSL has already been successfully used in the Pace global climate model, which achieves up to 4x speedup, more efficient than the original Fortran implementations.
+
+## Programming model
+
+The programming model of NDSL is composed of backend execution spaces, performance optimization pass and transformations, and memory spaces, memory layout. These abstraction semantics allow the formulation of generic algorithms and data structures which can then be mapped to different types of hardware architectures. Effectively, they allow for compile time transformation of algorithms to allow for adaptions of varying degrees of hardware parallelism as well as of the memory hierarchy. Figure 1 shows the high level architecture of NDSL (without orchestration option), From Fig. 1, it is shown that NDSL uses hierarchy levels intermediate representation (IR) to abstract the structure of computational program, which reduces the complexity of application code, and maintenance cost, while the code portability and scalability are increased. This method also avoids raising the information from lower level representations by means of static analysis, and memory leaking, where feasible, and performing optimizations at the high possible level of abstraction. The methods primarily leverages structural information readily available in the source code, it enables to apply the optimization, such as loop fusion, tiling and vectorization without the need for complicated analysis and heuristics.
+
+![NDSL flow](../images/dev/ndsl_flow.png)
+
+In NDSL, the python frontend code takes the user defined stencils to python AST using builtin ast module. In an AST, each node is an object defined in python AST grammar class (for more details, please refer: https://docs.python.org/3/library/ast.html). the AST node visitor (the NDSL/external/gt4py/src/gt4py/cartesian/frontend/gtscript_frontend.py) IRMaker class traverses the AST of a python function decorated by @gtscript.function and/or stencil objects, the Python AST of the program is then lowing to the Definition IR. The definition IR is high level IR, and is composed of high level program, domain-specific information, and the structure of computational operations which are independent of low level hardware platform. The definition of high level IR allows transformation of the IRs without loosing the performance of numerical libraries. However, the high level IR doesn’t contains detailed information that required for performance on specific low level runtime hardware. Specifically, the definition IR only preserves the necessary information to lower operations to runtime platform hardware instructions implementing coarse-grained vector operations, or to numerical libraries — such as cuBLAS/hipBLAS and Intel MKL.
+
+The definition IR is then transformed to GTIR (gt4py/src/gt4py/cartesian/frontend/defir_to_gtir.py), the GTIR stencils is defined as in NDSL
+
+```python
+class Stencil(LocNode, eve.ValidatedSymbolTableTrait):
+    name: str
+    api_signature: List[Argument]
+    params: List[Decl]
+    vertical_loops: List[VerticalLoop]
+    externals: Dict[str, Literal]
+    sources: Dict[str, str]
+    docstring: str
+
+    @property
+    def param_names(self) -> List[str]:
+        return [p.name for p in self.params]
+
+    _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl)
+```
+
+GTIR is also a high level IR, it contains vertical_loops loop statement, in the climate applications, the vertical loops usually need special treatment as the numerical unstanbility is a reason. The vertical_loops in GTIR as separate code block and help the following performance pass and transformation implementation. The program analysis pass/transformation is applied on the GTIR to remove the redundant nodes, and pruning the unused parameters, and data type and shape propagations of the symbols, and loop extensions.
+
+The GTIR is then further lowered to optimization IR (OIR), which is defined as
+
+```python
+class Stencil(LocNode, eve.ValidatedSymbolTableTrait):
+    name: str
+    # TODO: fix to be List[Union[ScalarDecl, FieldDecl]]
+    params: List[Decl]
+    vertical_loops: List[VerticalLoop]
+    declarations: List[Temporary]
+
+    _validate_dtype_is_set = common.validate_dtype_is_set()
+    _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl)
+```
+
+The OIR is particularly designed for performance optimization, the performance optimization algorithm are carried out on OIR by developing pass/transformations. Currently, the vertical loop merging, and horizontal execution loop merging, and loop unrolling and vectorization, statement fusion and pruning optimizations are available and activated by the environmental variable in the oir_pipeline module.
+
+After the optimization pipeline finished, the OIR is then converted to different backend IR, for example, DACE IR (SDFG). The DACE SDFG can be further optimized by its embedded pass/transformations algorithm, but in PACE application, we didn’t activate this optimization step. It should be pointed out that, during the OIR to SDFG process, the horizontal execution node is serialized to SDFG library node, within which the loop expansion information is encrypted.
+
+When using GT backend, the OIR is then directly used by the gt4py code generator to generate the C++ GridTools stencils (computation code), and the python binding code. In this backend, each horizontal execution node will be passed to and generate a separate GridTools stencil.
+
+NDSL also supports the whole program optimization model, this is called orchestration model in NDSL, currently it only supports DaCe backend. Whole program optimization with DaCe is the process of turning all Python and GT4Py code in generated C++. Only _orchestrate_ the runtime code of the model is applied, e.g. everything in the __call__ method of the module and all code in __init__ is executed like a normal GT backend.
+
+At the highest level in Pace, to turn on orchestration you need to flip the FV3_DACEMODE to an orchestrated options _and_ run a dace:* backend (it will error out if run anything else). Option for FV3_DACEMODE are:
+
+- _Python_: default, turns orchestration off.
+- _Build_: build the SDFG then exit without running. See Build for limitation of build strategy.
+- _BuildAndRun_: as above, but distribute the build and run.
+- _Run_: tries to execute, errors out if the cache don’t exists.
+
+Code is orchestrated two ways:
+
+- functions are orchestrated via orchestrate_function decorator,
+- methods are orchestrate via the orchestrate function (e.g. pace.driver.Driver._critical_path_step_all)
+
+The later is the way we orchestrate in our model. orchestrate is often called as the first function in the __init__. It patches _in place_ the methods and replace them with a wrapper that will deal with turning it all into executable SDFG when call time comes.
+
+The orchestration has two parameters: config (will expand later) and dace_compiletime_args.
+
+DaCe needs to be described all memory so it can interface it in the C code that will be executed. Some memory is automatically parsed (e.g. numpy, cupy, scalars) and others need description. In our case Quantity and others need to be flag as dace.compiletime which tells DaCe to not try to AOT the memory and wait for JIT time. The dace_compiletime_args helps with tagging those without having to change the type hint.
+
+Figure 2 shows the hierarchy levels of intermediate representations (IR) and the lowing process when orchestration option is activated.
+
+![NDSL orchestration](../images/dev/ndsl_orchestration.png)
+
+When the orchestrated option is turned on, the call method object is patched in place, replacing the original Callable with a wrapper that will trigger orchestration at call time. If the model configuration doesn’t demand orchestration, this won’t do anything. The orchestrated call methods and the computational stencils (lazy computational stencils) which are cached in a container, will be parsed to python AST by the frontend code during the runtime, then the python AST code will be converted to DaCe SDFG. The analysis and optimization will be applied before the C++ code is generated by the codegen, this process is called Just In Time (JIT) build, compared with the non-orchestration model, which is eagerly compiled and build. The JIT build caches the build information of computational stencils, and orchestrated methods, and it is more convenient to apply the analysis and optimization pass to the overall code, such as the merging of neighbor stencils made easy. Therefore, more optimized code can be generated, and better performance can be achieved during runtime.
+
+## Analysis and Optimization
+
+One of the major features of NDSL is that users can develop a new pass/transformation for the backend with new hardware, the passes and/or transformations are the key integrates in order to have good performance on the new hardware. In different abstract level, the passes and/or transformations perform different levels of optimization. For example, the loop level of optimization is independent of hardware, and can be applied to any backend, while the optimization of device placement, and memory and caches optimizations are dependent on different backend and hardware. In this section, we only focused on the optimizations that are independent of the backend hardware.
+
+The general procedure of code optimization has two steps, in the first step, a filter function is called to find the pattern that need to apply the pass and/or transformation, then apply the pass and/or transformation to the filtered pattern to insert or delete or replace the existing node with the optimized node. In NDSL, the following passes and/transformations are provided.
+
+```python
+def prune_unused_parameters(node: gtir.Stencil) -> gtir.Stencil:
+      assert isinstance(node, gtir.Stencil)
+      used_variables = (
+        node.walk_values()
+        .if_isinstance(gtir.FieldAccess, gtir.ScalarAccess)
+        .getattr("name")
+        .to_list()
+      )
+      used_params = list(filter(lambda param: param.name in used_variables, node.params))
+      return node.copy(update={"params": used_params})
+```
+
+## Code generators
diff --git a/docs/sphinx_doc/static/ndsl_flow.png b/docs/images/dev/ndsl_flow.png
similarity index 100%
rename from docs/sphinx_doc/static/ndsl_flow.png
rename to docs/images/dev/ndsl_flow.png
diff --git a/docs/sphinx_doc/static/ndsl_orchestration.png b/docs/images/dev/ndsl_orchestration.png
similarity index 100%
rename from docs/sphinx_doc/static/ndsl_orchestration.png
rename to docs/images/dev/ndsl_orchestration.png
diff --git a/docs/images/translate/image1.png b/docs/images/translate/image1.png
new file mode 100644
index 00000000..529d55ea
Binary files /dev/null and b/docs/images/translate/image1.png differ
diff --git a/docs/images/translate/image2.png b/docs/images/translate/image2.png
new file mode 100644
index 00000000..b73ea3f0
Binary files /dev/null and b/docs/images/translate/image2.png differ
diff --git a/docs/images/translate/image3.png b/docs/images/translate/image3.png
new file mode 100644
index 00000000..784aa36d
Binary files /dev/null and b/docs/images/translate/image3.png differ
diff --git a/docs/images/translate/image4.png b/docs/images/translate/image4.png
new file mode 100644
index 00000000..3ec73a86
Binary files /dev/null and b/docs/images/translate/image4.png differ
diff --git a/docs/images/translate/image5.png b/docs/images/translate/image5.png
new file mode 100644
index 00000000..0b4b90d9
Binary files /dev/null and b/docs/images/translate/image5.png differ
diff --git a/docs/includes/glossary.md b/docs/includes/glossary.md
new file mode 100644
index 00000000..c40936cc
--- /dev/null
+++ b/docs/includes/glossary.md
@@ -0,0 +1,24 @@
+<!-- institutions / groups / teams -->
+
+*[CSCS]: Swiss National Supercomputing Center
+*[ETH]: Swiss Federal Institute of Technology
+*[GFDL]: Geophysical Fluid Dynamics Laboratory
+*[NASA]: National Aeronautics and Space Administration
+*[NOAA]: National Oceanic and Atmospheric Administration
+*[SPCL]: Scalable Parallel Computing Laboratory (ETH Zurich)
+
+
+<!-- technology -->
+
+*[DSL]: Domain specific language
+*[FORTRAN]: Old programming language
+*[IR]: Intermedite Representation: An abstraction between source code and machine code, designed to simplify analysis and optimization during program compilation.
+*[NDSL]: NOAA/NASA Domain Specific Language middleware
+*[SDFG]: Stateful Dataflow multiGraphs - the IR of DaCe
+
+<!-- Modeling -->
+*[FMS]: Flexible Modeling System - see https://github.com/NOAA-GFDL/FMS
+*[FV3]: GFDL Finite­-Volume Cubed-Sphere Dynamical Core
+
+<!-- other -->
+*[ULP]: Unit in the last place: The smallest allowed difference between two floating-point numbers.
diff --git a/docs/sphinx_doc/overview.rst b/docs/index.md
similarity index 50%
rename from docs/sphinx_doc/overview.rst
rename to docs/index.md
index 930ffd72..24053f27 100644
--- a/docs/sphinx_doc/overview.rst
+++ b/docs/index.md
@@ -1,14 +1,14 @@
-========
-Overview
-========
+# NDSL Documentation
+
+NDSL allows atmospheric scientists to write focus on what matters in model development and hides away the complexities of coding for a super computer.
+
+## Quick Start
 
-Quick Start
-------------
 Python `3.11.x` is required for NDSL and all its third party dependencies for installation.
 
 NDSL submodules `gt4py` and `dace` to point to vetted versions, use `git clone --recurse-submodule` to update the git submodules.
 
-NDSL is __NOT__ available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages have a few options:
+NDSL is **NOT** available on `pypi`. Installation of the package has to be local, via `pip install ./NDSL` (`-e` supported). The packages have a few options:
 
 - `ndsl[test]`: installs the test packages (based on `pytest`)
 - `ndsl[develop]`: installs tools for development and tests.
@@ -18,9 +18,7 @@ NDSL uses pytest for its unit tests, the tests are available via:
 - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
 - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
 
-
-Requirements & supported compilers
--------------------------------------
+## Requirements & supported compilers
 
 For CPU backends:
 
@@ -38,15 +36,13 @@ For GPU backends (the above plus):
 - Libraries:
   - MPI compiled with cuda support
 
-
-NDSL installation and testing
--------------------------------------
+## NDSL installation and testing
 
 NDSL is not available at `pypi`, it uses
 
-  .. code-block:: console
-
-      pip install NDSL
+```bash
+pip install NDSL
+```
 
 to install NDSL locally.
 
@@ -60,67 +56,61 @@ Tests are available via:
 - `pytest -x test`: running CPU serial tests (GPU as well if `cupy` is installed)
 - `mpirun -np 6 pytest -x test/mpi`: running CPU parallel tests (GPU as well if `cupy` is installed)
 
-
-Configurations for Pace
-----------------------------
+## Configurations for Pace
 
 Configurations for Pace to use NDSL with different backend:
 
 - FV3_DACEMODE=Python[Build|BuildAndRun|Run] controls the full program optimizer behavior
 
-  - Python: default, use stencil only, no full program optmization
+  - Python: default, use stencil only, no full program optimization
 
   - Build: will build the program then exit. This _build no matter what_. (backend must be `dace:gpu` or `dace:cpu`)
 
   - BuildAndRun: same as above but after build the program will keep executing (backend must be `dace:gpu` or `dace:cpu`)
 
-  - Run: load pre-compiled program and execute, fail if the .so is not present (_no hashs check!_) (backend must be `dace:gpu` or `dace:cpu`)
+  - Run: load pre-compiled program and execute, fail if the .so is not present (_no hash check!_) (backend must be `dace:gpu` or `dace:cpu`)
 
 - PACE_FLOAT_PRECISION=64 control the floating point precision throughout the program.
 
-
 Install Pace with different NDSL backend:
 
-  - Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
-
-  - When cloning Pace you will need to update the repository's submodules as well:
+- Shell scripts to install Pace using NDSL backend on specific machines such as Gaea can be found in `examples/build_scripts/`.
+- When cloning Pace you will need to update the repository's submodules as well:
 
-  .. code-block:: console
-
-      $ git clone --recursive https://github.com/ai2cm/pace.git
+```bash
+git clone --recursive https://github.com/ai2cm/pace.git
+```
 
   or if you have already cloned the repository:
 
-  .. code-block:: console
-
-      $ git submodule update --init --recursive
-
-
-  - Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
-  You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
-  If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:
-
-  .. code-block:: console
-
-      $ cd BOOST/ROOT
-      $ wget https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz
-      $ tar -xzf boost_1_79_0.tar.gz
-      $ mkdir -p boost_1_79_0/include
-      $ mv boost_1_79_0/boost boost_1_79_0/include/
-      $ export BOOST_ROOT=BOOST/ROOT/boost_1_79_0
-
+```bash
+git submodule update --init --recursive
+```
 
-  - We recommend creating a python `venv` or conda environment specifically for Pace.
+- Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend.
+You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
+If installed outside the standard header locations, gt4py requires that `$BOOST_ROOT` be set:
 
-  .. code-block:: console
+```bash
+cd BOOST/ROOT
+wget https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz
+tar -xzf boost_1_79_0.tar.gz
+mkdir -p boost_1_79_0/include
+mv boost_1_79_0/boost boost_1_79_0/include/
+export BOOST_ROOT=BOOST/ROOT/boost_1_79_0
+```
 
-      $ python3 -m venv venv_name
-      $ source venv_name/bin/activate
+- We recommend creating a python `venv` or conda environment specifically for Pace.
 
-  - Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
+```bash
+python3 -m venv venv_name
+source venv_name/bin/activate
+```
 
-  .. code-block:: console
+- Inside of your pace `venv` or conda environment pip install the Python requirements, GT4Py, and Pace:
 
-      $ pip3 install -r requirements_dev.txt -c constraints.txt
+```bash
+pip3 install -r requirements_dev.txt -c constraints.txt
+```
 
-  - There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation   (`requirements_docs.txt`).
+- There are also separate requirements files which can be installed for linting (`requirements_lint.txt`) and building documentation   (`requirements_docs.txt`).
diff --git a/docs/porting/index.md b/docs/porting/index.md
new file mode 100644
index 00000000..51459663
--- /dev/null
+++ b/docs/porting/index.md
@@ -0,0 +1,87 @@
+# Notes on porting FORTRAN code
+
+This part of the documentation includes notes about porting FORTRAN code to NDSL.
+
+## General Concepts
+
+Since we are not trying to do model developing but rather replicate an existing model, the main philosophy is to replicate model behavior as precisely as possible.
+Since weather and climate models can take diverging paths based on very small input differences, as described in [\[1\]][1], a bitwise reproducible code is impossible to achieve.
+There were attempts at solving this problem like shown in [\[2\]][2] or [\[3\]][3] but all of those require heavy modification to the original code.
+In our case, the switch from the original FORTRAN environment to a C++ environment can already contribute to these small errors showing up and therefore a 1:1 validation on a large scale is impossible.
+This effect gets further enhanced by computation on GPUs.
+Lastly the mixing of precisions found in various models is often done slightly unmethodical and can further complicate the understand of what precision is required where.
+
+Since large scale validation is therefore close to impossible, we are trying to get reproducible results (within a margin) on smaller sub-components of the model.
+When porting code, we therefore try to break down larger components into logical, numerically coherent substructures that can be tested and validated individually.
+This breakdown serves two main purposes:
+
+1. Give us confidence, that the ported code behaves as intended.
+2. Allow us to monitor if or how performance optimization down the road changes the numerical results of our model components.
+
+## Porting Guidelines
+
+Since GT4Py has certain restrictions on what can be in the same stencil and what needs to be in separate stencils, there is no absolute 1:1 mapping that can or should be applied.
+
+The best practices we found are:
+
+1. A numerically self-contained module should always live in a single class.
+2. If possible, try to isolate individual numerical motifs into functions.
+
+### Example
+
+To illustrate best practices, we show a stripped version of the the nonhydrostatic vertical solver on the C-grid (Also know as the Riemann Solver):
+
+#### Main definition
+
+```python
+class NonhydrostaticVerticalSolverCGrid:
+    def __init__(self, ...):
+        # Definition of the (potentially multiple) stencils to call
+        self._precompute_stencil = stencil_factory.from_origin_domain(
+            precompute,
+            origin=origin,
+            domain=domain,
+        )
+        self._compute_sim1_solve = stencil_factory.from_origin_domain(
+            sim1_solver,
+            origin=origin,
+            domain=domain,
+        )
+        # Definition of temporary variables share across two stencils
+        # that are not used outside the module
+        self._pfac = FloatFieldIJ()
+        ...
+    def __call__(self, cappa: FloatField, delpc: FloatField):
+        self._precompute_stencil(cappa, _pfac)
+        self._compute_sim1_solve(_pfac, delpc)
+```
+
+#### Stencil Definitions
+
+```python
+#constants definition
+c1 = Float(-2.0) / Float(14.0)
+c2 = Float(11.0) / Float(14.0)
+c3 = Float(5.0) / Float(14.0)
+
+#function for numerical standalone motif
+@gtscript.function
+def vol_conserv_cubic_interp_func_y(v):
+    return c1 * v[0, -2, 0] + c2 * v[0, -1, 0] + c3 * v
+
+def precompute(cappa: FloatField, _pfac: FloatFieldIJ):
+    # small computation directly in the stencil
+    with computation(PARALLEL), interval(...):
+        # a variable used only in one stencil can be defined here
+        tmpvar = cappa[1,0,0] + 1
+    with computation(PARALLEL), interval(0, 1):
+        _pfac = tmpvar[0,0,1]
+
+def sim1_solver(cappa: FloatField, _pfac: FloatFieldIJ):
+    with computation(PARALLEL), interval(...):
+        cappa = vol_conserv_cubic_interp_func_y(cappa) + _pfac
+```
+
+[1]: <https://www.climate.gov/news-features/blogs/enso/butterflies-rounding-errors-and-chaos-climate-models> "Chaos in climate models"
+[2]: <https://pasc17.org/fileadmin/user_upload/pasc17/program/post125s2.pdf> "Reproducible Climate Simulations"
+[3]: <http://htor.inf.ethz.ch/sec/bitrep-ipdps.pdf> "Bit reproducible HPC applications"
diff --git a/docs/porting/translate/index.md b/docs/porting/translate/index.md
new file mode 100644
index 00000000..1aa083bd
--- /dev/null
+++ b/docs/porting/translate/index.md
@@ -0,0 +1,60 @@
+# Translate tests
+
+We call tests that validate subsets of computation against serialized data "translate tests". These should provide a baseline with which we can validate ported code and ensure the pipeline generates expected results.
+
+## The Translate infrastructure
+
+The infrastructure is set up in a way that for basic cases, all the default implementations are enough:
+
+The `TranslateFortranData2Py` base class will be evaluated through the function `test_sequential_savepoint`.
+The general structure is:
+
+1. Extract tolerances for errors - either the defaults or the overwritten ones:
+    - Maximal absolute error
+    - Maximal relative error
+    - Allowed ULP difference
+2. Extract input data from `{savepoint_name}-In.nc`
+3. Run the `compute` function, returning the outputs.
+4. Extract reference output data from `{savepoint_name}-Out.nc`
+5. Compare the data in `out_vars` to the reference data.
+
+For these steps to work, the name of the translate test needs to match the name of the data.
+In case of special handling required, almost everything can be overwritten:
+
+### Overwriting thresholds
+
+You can create an overwrite file to manually set the threshold in you data directory:
+
+![image1.png](../../images/translate/image1.png)
+
+### Overwriting Arguments to your compute function
+
+The compute_func will be called automatically in the test. If your names in the netcdf are matching the `kwargs` of your function directly, no further action required:
+
+![image2.png](../../images/translate/image2.png)
+
+If you need to rename it from the netcdf, you can use ["serialname"]:
+
+![image3.png](../../images/translate/image3.png)
+
+The same applies for scalar inputs with parameters:
+
+![image4.png](../../images/translate/image4.png)
+
+### Modifying output variables
+
+This can be required either if not all output is serialized, the naming is different or we need the same data as the input:
+
+![image4.png](../../images/translate/image4.png)
+
+### Modifying the `compute` function
+
+Normally, compute has the three steps:
+
+1. setup input
+2. call `compute_func`
+3. slice outputs
+
+Slight adaptations to every step are possible:
+
+![image5.png](../../images/translate/image5.png)
diff --git a/docs/requirement_docs.txt b/docs/requirement_docs.txt
deleted file mode 100644
index 02306dcc..00000000
--- a/docs/requirement_docs.txt
+++ /dev/null
@@ -1,5 +0,0 @@
-recommonmark
-sphinx>=1.4
-sphinx-argparse
-sphinx_rtd_theme
-sphinx-gallery
diff --git a/docs/sphinx_doc/Makefile b/docs/sphinx_doc/Makefile
deleted file mode 100644
index 17354e4f..00000000
--- a/docs/sphinx_doc/Makefile
+++ /dev/null
@@ -1,225 +0,0 @@
-# Makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-PAPER         =
-BUILDDIR      = _build
-
-# Internal variables.
-PAPEROPT_a4     = -D latex_paper_size=a4
-PAPEROPT_letter = -D latex_paper_size=letter
-ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
-# the i18n builder cannot share the environment and doctrees with the others
-I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
-
-.PHONY: help
-help:
-	@echo "Please use \`make <target>' where <target> is one of"
-	@echo "  html       to make standalone HTML files"
-	@echo "  dirhtml    to make HTML files named index.html in directories"
-	@echo "  singlehtml to make a single large HTML file"
-	@echo "  pickle     to make pickle files"
-	@echo "  json       to make JSON files"
-	@echo "  htmlhelp   to make HTML files and a HTML help project"
-	@echo "  qthelp     to make HTML files and a qthelp project"
-	@echo "  applehelp  to make an Apple Help Book"
-	@echo "  devhelp    to make HTML files and a Devhelp project"
-	@echo "  epub       to make an epub"
-	@echo "  epub3      to make an epub3"
-	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
-	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
-	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
-	@echo "  text       to make text files"
-	@echo "  man        to make manual pages"
-	@echo "  texinfo    to make Texinfo files"
-	@echo "  info       to make Texinfo files and run them through makeinfo"
-	@echo "  gettext    to make PO message catalogs"
-	@echo "  changes    to make an overview of all changed/added/deprecated items"
-	@echo "  xml        to make Docutils-native XML files"
-	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
-	@echo "  linkcheck  to check all external links for integrity"
-	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
-	@echo "  coverage   to run coverage check of the documentation (if enabled)"
-	@echo "  dummy      to check syntax errors of document sources"
-
-.PHONY: clean
-clean:
-	rm -rf $(BUILDDIR)/*
-
-.PHONY: html
-html:
-	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
-
-.PHONY: dirhtml
-dirhtml:
-	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
-
-.PHONY: singlehtml
-singlehtml:
-	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
-	@echo
-	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
-
-.PHONY: pickle
-pickle:
-	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
-	@echo
-	@echo "Build finished; now you can process the pickle files."
-
-.PHONY: json
-json:
-	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
-	@echo
-	@echo "Build finished; now you can process the JSON files."
-
-.PHONY: htmlhelp
-htmlhelp:
-	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
-	@echo
-	@echo "Build finished; now you can run HTML Help Workshop with the" \
-	      ".hhp project file in $(BUILDDIR)/htmlhelp."
-
-.PHONY: qthelp
-qthelp:
-	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
-	@echo
-	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
-	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
-	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/ERF.qhcp"
-	@echo "To view the help file:"
-	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/ERF.qhc"
-
-.PHONY: applehelp
-applehelp:
-	$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
-	@echo
-	@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
-	@echo "N.B. You won't be able to view it unless you put it in" \
-	      "~/Library/Documentation/Help or install it in your application" \
-	      "bundle."
-
-.PHONY: devhelp
-devhelp:
-	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
-	@echo
-	@echo "Build finished."
-	@echo "To view the help file:"
-	@echo "# mkdir -p $$HOME/.local/share/devhelp/ERF"
-	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/ERF"
-	@echo "# devhelp"
-
-.PHONY: epub
-epub:
-	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
-	@echo
-	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
-
-.PHONY: epub3
-epub3:
-	$(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3
-	@echo
-	@echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3."
-
-.PHONY: latex
-latex:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo
-	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
-	@echo "Run \`make' in that directory to run these through (pdf)latex" \
-	      "(use \`make latexpdf' here to do that automatically)."
-
-.PHONY: latexpdf
-latexpdf:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through pdflatex..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-
-.PHONY: latexpdfja
-latexpdfja:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through platex and dvipdfmx..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-
-.PHONY: text
-text:
-	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
-	@echo
-	@echo "Build finished. The text files are in $(BUILDDIR)/text."
-
-.PHONY: man
-man:
-	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
-	@echo
-	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
-
-.PHONY: texinfo
-texinfo:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo
-	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
-	@echo "Run \`make' in that directory to run these through makeinfo" \
-	      "(use \`make info' here to do that automatically)."
-
-.PHONY: info
-info:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo "Running Texinfo files through makeinfo..."
-	make -C $(BUILDDIR)/texinfo info
-	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
-
-.PHONY: gettext
-gettext:
-	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
-	@echo
-	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
-
-.PHONY: changes
-changes:
-	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
-	@echo
-	@echo "The overview file is in $(BUILDDIR)/changes."
-
-.PHONY: linkcheck
-linkcheck:
-	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
-	@echo
-	@echo "Link check complete; look for any errors in the above output " \
-	      "or in $(BUILDDIR)/linkcheck/output.txt."
-
-.PHONY: doctest
-doctest:
-	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
-	@echo "Testing of doctests in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/doctest/output.txt."
-
-.PHONY: coverage
-coverage:
-	$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
-	@echo "Testing of coverage in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/coverage/python.txt."
-
-.PHONY: xml
-xml:
-	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
-	@echo
-	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
-
-.PHONY: pseudoxml
-pseudoxml:
-	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
-	@echo
-	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
-
-.PHONY: dummy
-dummy:
-	$(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy
-	@echo
-	@echo "Build finished. Dummy builder generates no files."
diff --git a/docs/sphinx_doc/conf.py b/docs/sphinx_doc/conf.py
deleted file mode 100644
index e31cccc9..00000000
--- a/docs/sphinx_doc/conf.py
+++ /dev/null
@@ -1,331 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# import os
-
-
-# sys.path.insert(0, os.path.abspath('.'))
-# sys.path.append("../breathe")
-
-# -- General configuration ------------------------------------------------
-
-# If your documentation needs a minimal Sphinx version, state it here.
-#
-# needs_sphinx = '1.0'
-
-# Add any Sphinx extension module names here, as strings. They can be
-# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
-# ones.
-extensions = ["sphinx.ext.mathjax"]
-
-# Add any paths that contain templates here, relative to this directory.
-templates_path = ["_templates"]
-# breathe_projects = {"NDSL": "../doxygen_output/xml/"}
-# breathe_default_project = "NDSL"
-
-# fortran_src ='../../Source/Src_2d/'
-# fortran_ext =[' 'F90']
-
-# The suffix(es) of source filenames.
-# You can specify multiple suffix as a list of string:
-#
-# source_suffix = ['.rst', '.md']
-source_suffix = ".rst"
-
-# The encoding of source files.
-#
-# source_encoding = 'utf-8-sig'
-
-# The master toctree document.
-master_doc = "index"
-
-# General information about the project.
-project = u"NDSL"
-copyright = u" "
-author = u"NOAA/NASA NDSL development team"
-
-# The version info for the project you're documenting, acts as replacement for
-# |version| and |release|, also used in various other places throughout the
-# built documents.
-#
-# The short X.Y version.
-version = u"2025.01.00"
-# The full version, including alpha/beta/rc tags.
-release = u"2025.01.00"
-
-# The language for content autogenerated by Sphinx. Refer to documentation
-# for a list of supported languages.
-#
-# This is also used if you do content translation via gettext catalogs.
-# Usually you set "language" from the command line for these cases.
-language = "en"
-
-# There are two options for replacing |today|: either, you set today to some
-# non-false value, then it is used:
-#
-# today = ''
-#
-# Else, today_fmt is used as the format for a strftime call.
-#
-# today_fmt = '%B %d, %Y'
-
-# List of patterns, relative to source directory, that match files and
-# directories to ignore when looking for source files.
-# This patterns also effect to html_static_path and html_extra_path
-exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
-
-# The reST default role (used for this markup: `text`) to use for all
-# documents.
-#
-# default_role = None
-
-# If true, '()' will be appended to :func: etc. cross-reference text.
-#
-# add_function_parentheses = True
-
-# If true, the current module name will be prepended to all description
-# unit titles (such as .. function::).
-#
-# add_module_names = True
-
-# If true, sectionauthor and moduleauthor directives will be shown in the
-# output. They are ignored by default.
-#
-# show_authors = False
-
-# The name of the Pygments (syntax highlighting) style to use.
-pygments_style = "sphinx"
-
-# A list of ignored prefixes for module index sorting.
-# modindex_common_prefix = []
-
-# If true, keep warnings as "system message" paragraphs in the built documents.
-# keep_warnings = False
-
-# If true, `todo` and `todoList` produce output, else they produce nothing.
-todo_include_todos = False
-
-numfig = True
-numfig_format = {"figure": "%s", "table": "%s", "code-block": "%s"}
-
-# -- Options for HTML output ----------------------------------------------
-
-# The theme to use for HTML and HTML Help pages.  See the documentation for
-# a list of builtin themes.
-#
-# html_theme = 'nature'
-html_theme = "sphinx_rtd_theme"
-
-# Theme options are theme-specific and customize the look and feel of a theme
-# further.  For a list of options available for each theme, see the
-# documentation.
-#
-# html_theme_options = {}
-
-# Add any paths that contain custom themes here, relative to this directory.
-# html_theme_path = []
-
-# The name for this set of Sphinx documents.
-# "<project> v<release> documentation" by default.
-#
-# html_title = u'NDSL v0.01'
-
-# A shorter title for the navigation bar.  Default is the same as html_title.
-#
-# html_short_title = None
-
-# The name of an image file (relative to this directory) to place at the top
-# of the sidebar.
-#
-# html_logo = None
-
-# The name of an image file (relative to this directory) to use as a favicon of
-# the docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
-# pixels large.
-#
-# html_favicon = None
-
-# Add any paths that contain custom static files (such as style sheets) here,
-# relative to this directory. They are copied after the builtin static files,
-# so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ["static"]
-
-# Add any extra paths that contain custom files (such as robots.txt or
-# .htaccess) here, relative to this directory. These files are copied
-# directly to the root of the documentation.
-#
-# html_extra_path = []
-
-# If not None, a 'Last updated on:' timestamp is inserted at every page
-# bottom, using the given strftime format.
-# The empty string is equivalent to '%b %d, %Y'.
-#
-# html_last_updated_fmt = None
-
-# If true, SmartyPants will be used to convert quotes and dashes to
-# typographically correct entities.
-#
-# html_use_smartypants = True
-
-# Custom sidebar templates, maps document names to template names.
-#
-# html_sidebars = {}
-
-# Additional templates that should be rendered to pages, maps page names to
-# template names.
-#
-# html_additional_pages = {}
-
-# If false, no module index is generated.
-#
-# html_domain_indices = True
-
-# If false, no index is generated.
-#
-# html_use_index = True
-
-# If true, the index is split into individual pages for each letter.
-#
-# html_split_index = False
-
-# If true, links to the reST sources are added to the pages.
-#
-# html_show_sourcelink = True
-
-# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
-#
-# html_show_sphinx = True
-
-# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
-#
-# html_show_copyright = True
-
-# If true, an OpenSearch description file will be output, and all pages will
-# contain a <link> tag referring to it.  The value of this option must be the
-# base URL from which the finished HTML is served.
-#
-# html_use_opensearch = ''
-
-# This is the file name suffix for HTML files (e.g. ".xhtml").
-# html_file_suffix = None
-
-# Language to be used for generating the HTML full-text search index.
-# Sphinx supports the following languages:
-#   'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
-#   'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr', 'zh'
-#
-# html_search_language = 'en'
-
-# A dictionary with options for the search language support, empty by default.
-# 'ja' uses this config value.
-# 'zh' user can custom change `jieba` dictionary path.
-#
-# html_search_options = {'type': 'default'}
-
-# The name of a javascript file (relative to the configuration directory) that
-# implements a search results scorer. If empty, the default will be used.
-#
-# html_search_scorer = 'scorer.js'
-
-# Output file base name for HTML help builder.
-htmlhelp_basename = "ndsl document"
-
-# -- Options for LaTeX output ---------------------------------------------
-
-latex_elements = {
-    # The paper size ('letterpaper' or 'a4paper').
-    #
-    # 'papersize': 'letterpaper',
-    # The font size ('10pt', '11pt' or '12pt').
-    #
-    # 'pointsize': '10pt',
-    # Additional stuff for the LaTeX preamble.
-    #
-    # 'preamble': '',
-    # Latex figure (float) alignment
-    #
-    # 'figure_align': 'htbp',
-}
-
-# Grouping the document tree into LaTeX files. List of tuples
-# (source start file, target name, title,
-#  author, documentclass [howto, manual, or own class]).
-latex_documents = [
-    (master_doc, "NDSL.tex", u"NDSL Documentation", author, "manual"),
-]
-
-# The name of an image file (relative to this directory) to place at the top of
-# the title page.
-#
-# latex_logo = None
-
-# For "manual" documents, if this is true, then toplevel headings are parts,
-# not chapters.
-#
-# latex_use_parts = False
-
-# If true, show page references after internal links.
-#
-# latex_show_pagerefs = False
-
-# If true, show URL addresses after external links.
-#
-# latex_show_urls = False
-
-# Documents to append as an appendix to all manuals.
-#
-# latex_appendices = []
-
-# It false, will not define \strong, \code,     itleref, \crossref ... but only
-# \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added
-# packages.
-#
-# latex_keep_old_macro_names = True
-
-# If false, no module index is generated.
-#
-# latex_domain_indices = True
-
-
-# -- Options for manual page output ---------------------------------------
-
-# One entry per manual page. List of tuples
-# (source start file, name, description, authors, manual section).
-man_pages = [(master_doc, "ndsl", u"NDSL Documentation", [author], 1)]
-
-# If true, show URL addresses after external links.
-#
-# man_show_urls = False
-
-
-# -- Options for Texinfo output -------------------------------------------
-
-# Grouping the document tree into Texinfo files. List of tuples
-# (source start file, target name, title, author,
-#  dir menu entry, description, category)
-texinfo_documents = [
-    (
-        master_doc,
-        "ndsl",
-        u"NDSL Documentation",
-        author,
-        "NDSL",
-        "One line description of project.",
-        "Miscellaneous",
-    ),
-]
-
-# Documents to append as an appendix to all manuals.
-#
-# texinfo_appendices = []
-
-# If false, no module index is generated.
-#
-# texinfo_domain_indices = True
-
-# How to display URL addresses: 'footnote', 'no', or 'inline'.
-#
-# texinfo_show_urls = 'footnote'
-
-# If true, do not generate a @detailmenu in the "Top" node's menu.
-#
-# texinfo_no_detailmenu = False
diff --git a/docs/sphinx_doc/dace.rst b/docs/sphinx_doc/dace.rst
deleted file mode 100644
index 43057918..00000000
--- a/docs/sphinx_doc/dace.rst
+++ /dev/null
@@ -1,9 +0,0 @@
-Dace
-============
-
-DaCe is a parallel programming framework developed at Scalable Parallel Computing Laboratory (SPCL). DaCe is a high level intermediate representation (IR) that parses most of the Python/NumPy semantics and Fortran programming languages in the frontend to DaCe IR, and then optimizes the IR by passes/transformations, the DaCe IRs are then used by the backend codegen to generate highly efficient C++ code for high-performance CPU, GPU, and FPGA hardware devices.
-
-DaCe IR's use the Stateful Dataflow multiGraphs (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, the used optimizations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.
-
-For more detailed document about DaCe, please refer to the following link:
-https://spcldace.readthedocs.io/en/latest/index.html
diff --git a/docs/sphinx_doc/developer_guide.rst b/docs/sphinx_doc/developer_guide.rst
deleted file mode 100644
index e44a6f12..00000000
--- a/docs/sphinx_doc/developer_guide.rst
+++ /dev/null
@@ -1,161 +0,0 @@
-Developer Guide
-=============
-
-1: Introduction
-----------------
-Recently, Python has became the dominant programming language in the machine learning and data sciences communities since it is easy to learn and program. However, the performance of Python is still a major concern in scientific computing and HPC community. In the scientific computing and HPC community, the most widely used programming languages are C/C++ and Fortran, Python is often used as script language for pre- and post-processing.
-
-The major performance issue in Python programming language, especially in computation-intensive applications, are loops, which are often the performance bottlenecks of an application in other programming languages too, such as C++ and Fortran. However, Python programs are often observed to be 10x to 100x slower than C, C++ and Fortran programs. In order to achieve peak hardware performance, the scientific computing communities have tried different programming models, such as OpenMP, Cilk+, and Thread Building Blocks (TBB), as well as Linux p-threads for multi/many-core processors and GPUs, Kokkos, RAJA, OpenMP offload, and OpenACC for highest performance on CPU/GPUs heterogeneous system. All of these programming models are only available for C, C++ and Fortran. Only a few work that target to high perfromance for Python programming language.
-
-The Python based NDSL programming model described in this developer's guide provides an alternative solution to reach peak hardware performance with relatively little programming effort by using the stencil semantics. A stencil is similar to parallel for kernels that are used in Kokkos and RAJA, to update array elements according to a fixed access pattern. With the stencil semantics in mind, NDSL, for example, can be used to write matrix multiplication kernels that match the performance of cuBLAS/hipBLAS that many GPU programmers can’t do in Cuda/HiP using only about 30 lines of code. It greatly reduces the programmer's effort, and NDSL has already been successfully used in the Pace global climate model, which achieves up to 4x speedup, more efficient than the original Fortran implementations.
-
-2: Programming model
-----------------------------------------------------
-The programming model of NDSL is composed of backend execution spaces, performance optimization pass and transformations, and memory spaces, memory layout. These abstraction semantics allow the formulation of generic algorithms and data structures which can then be mapped to different types of hardware architectures. Effectively, they allow for compile time transformation of algorithms to allow for adaptions of varying degrees of hardware parallelism as well as of the memory hierarchy. Figure 1 shows the high level architecture of NDSL (without orchestration option), From Fig. 1, it is shown that NDSL uses hierarchy levels intermediate representation (IR) to abstract the structure of computational program, whcih reduces the complexity of application code, and maintenance cost, while the code portability and scalability are increased. This method also avoids raising the information from lower level representations by means of static analysis, and memory leaking, where feasible, and performaing optimizations at the high possible level of abstraction. The methods primarily leverages structural information readily available in the source code, it enables to apply the optimization, such as loop fusion, tiling and vectorization without the need for complicated analysis and heuristics.
-
-.. Figure 1:
-
-.. figure:: static/ndsl_flow.png
-   :width: 860
-   :align: center
-
-   the high-level architecture of NDSL stencil life cycle for non-orchestration run.
-
-
-
-In NDSL, the python frontend code takes the user defined stencils to python AST using builtin ast module. In an AST, each node is an object defined in python AST grammar class (for more details, please refer: https://docs.python.org/3/library/ast.html). the AST node visitor (the NDSL/external/gt4py/src/gt4py/cartesian/frontend/gtscript_frontend.py) IRMaker class traverses the AST of a python function decorated by @gtscript.function and/or stencil objects, the Python AST of the program is then lowing to the Definition IR. The definition IR is high level IR, and is composed of high level program, domain-specific information, and the structure of computational operations which are independent of low level hardware platform. The definition of high level IR allows transformation of the IRs without lossing the performance of numerical libraries. However, the high level IR doesn't contains detailed information that required for performance on specific low level runtime hardware. Specificially, the definition IR only preserves the necessary information to lower operations to runtime platform hardware instructions implementing coarse-grained vector operations, or to numerical libraries — such as cuBLAS/hipBLAS and Intel MKL.
-
-
-The definition IR is then transformed to GTIR (gt4py/src/gt4py/cartesian/frontend/defir_to_gtir.py), the GTIR stencils is defined as in NDSL
-
-.. code-block:: none
-
-   class Stencil(LocNode, eve.ValidatedSymbolTableTrait):
-       name: str
-       api_signature: List[Argument]
-       params: List[Decl]
-       vertical_loops: List[VerticalLoop]
-       externals: Dict[str, Literal]
-       sources: Dict[str, str]
-       docstring: str
-
-       @property
-       def param_names(self) -> List[str]:
-           return [p.name for p in self.params]
-
-       _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl)
-
-
-
-GTIR is also a high level IR, it contains `vertical_loops` loop statement, in the climate applications, the vertical loops usually need special treatment as the numerical unstability is arison. The `vertical_loops` in GTIR as separate code block and help the following performance pass and transofrmation implementation. The program analysis pass/transformation is applied on the GTIR to remove the redunant nodes, and prunning the unused parameters, and data type and shape propogations of the symbols, and loop extensions.
-
-
-The GTIR is then further lowered to optimization IR (OIR), which is defined as
-
-
-.. code-block:: none
-
-   class Stencil(LocNode, eve.ValidatedSymbolTableTrait):
-       name: str
-       # TODO: fix to be List[Union[ScalarDecl, FieldDecl]]
-       params: List[Decl]
-       vertical_loops: List[VerticalLoop]
-       declarations: List[Temporary]
-
-       _validate_dtype_is_set = common.validate_dtype_is_set()
-       _validate_lvalue_dims = common.validate_lvalue_dims(VerticalLoop, FieldDecl)
-
-
-The OIR is particularly designed for performance optimization, the performation optimization algorithm are carried out on OIR by developing pass/transorformations. Currently, the vertical loop merging, and horizonal execution loop merging, and loop unrolling and vectorization, statement fusion and pruning optimizations are available and activated by the environmental variable in the oir_pipeline module.
-
-
-After the optimization pipeline finished, the OIR is then converted to different backend IR, for example, DACE IR (SDFG). The DACE SDFG can be further optimizated by its embeded pass/transormations algorithm, but in PACE application, we didn't activate this optimization step. It should be pointed out that, during the OIR to SDFG process, the `horizontal execution` node is serialized to SDFG library node, within which the loop expansion information is encrypted.
-
-When using GT backend, the OIR is then directly used by the `gt4py` code generator to generate the C++ gridtool stencils (computation code), and the python binding code. In this backend, each `horizontal execution` node will be passed to and generate a seperate gridtool stencil.
-
-
-NDSL also supports the whole program optimization model, this is called orchestration model in NDSL, currently it only supports DaCe backend. Whole program optimziation with DaCe is the process of turning all Python and GT4Py code in generated C++. Only _orchestrate_ the runtime code of the model is applied, e.g. everything in the `__call__` method of the module and all code in `__init__` is executed like a normal GT backend.
-
-At the highest level in Pace, to turn on orchestration you need to flip the `FV3_DACEMODE` to an orchestrated options _and_ run a `dace:*` backend (it will error out if run anything else). Option for `FV3_DACEMODE` are:
-
-- _Python_: default, turns orchestration off.
-- _Build_: build the SDFG then exit without running. See Build for limitation of build strategy.
-- _BuildAndRun_: as above, but distribute the build and run.
-- _Run_: tries to execute, errors out if the cache don't exists.
-
-Code is orchestrated two ways:
-
-- functions are orchestrated via `orchestrate_function` decorator,
-- methods are orchestrate via the `orchestrate` function (e.g. `pace.driver.Driver._critical_path_step_all`)
-
-The later is the way we orchestrate in our model. `orchestrate` is often called as the first function in the `__init__`. It patches _in place_ the methods and replace them with a wrapper that will deal with turning it all into executable SDFG when call time comes.
-
-The orchestration has two parameters: config (will expand later) and `dace_compiletime_args`.
-
-DaCe needs to be described all memory so it can interface it in the C code that will be executed. Some memory is automatically parsed (e.g. numpy, cupy, scalars) and others need description. In our case `Quantity` and others need to be flag as `dace.compiletime` which tells DaCe to not try to AOT the memory and wait for JIT time. The `dace_compiletime_args` helps with tagging those without having to change the type hint.
-
-Figure 2 shows the hierarchy levels of intermediate representations (IR) and the lowing process when orchestration option is activated.
-
-.. Figure 2:
-
-.. figure:: static/ndsl_orchestration.png
-   :width: 860
-   :align: center
-
-   the high-level architecture of NDSL stencil life cycle for orchestration run.
-
-
-
-When the orchestrated option is turned on, the call method object is patched in place, replacing the orignal Callable with a wrapper that will trigger orchestration at call time. If the model configuration doesn't demand orchestration, this won't do anything. The orchestrated call methods and the computational stencils (lazy computational stencils) which are cached in a container, will be parsed to python AST by the frontend code during the runtime, then the python AST code will be converted to DaCe SDFG. The analysis and optimization will be applied before the C++ code is generated by the codegen, this process is called Just In Time (JIT) build, compared with the non-orchestration model, which is eagerly compiled and build. The JIT build caches the build information of computational stencils, and orchestrated methods, and it is more convenient to apply the analysis and optimization pass to the overall code, such as the merging of neighbor stencils made easy. Therefore, more optimized code can be generated, and better performance can be achieved during runtime.
-
-
-3: Analysis and Optimization
-----------------------------------------------------
-One of the major features of NDSL is that users can develop a new pass/transformation for the backend with new hardware, the passes and/or transformations are the key integrates in order to have good performance on the new hardware. In different abstract level, the passes and/or transformations perform different levels of optimization. For example, the loop level of optimization is independent of hardware, and can be applied to any backend, while the optimization of device placement, and memory and caches optimizations are dependent on different backend and hardware. In this section, we only focused on the optimizations that are independent of the backend hardware.
-
-
-The general procedure of code optimization has two steps, in the first step, a filter function is called to find the pattern that need to apply the pass and/or transformation, then apply the pass and/or transoformation to the filtered pattern to insert or delte or replace the existing node with the optimizated node. In NDSL, the following passes and/transorformations are provided.
-
-
-   3.1: Prune Unused Parameters
-   -----------------------------------------
-
-   .. code-block:: none
-
-      def prune_unused_parameters(node: gtir.Stencil) -> gtir.Stencil:
-            assert isinstance(node, gtir.Stencil)
-            used_variables = (
-              node.walk_values()
-              .if_isinstance(gtir.FieldAccess, gtir.ScalarAccess)
-              .getattr("name")
-              .to_list()
-            )
-            used_params = list(filter(lambda param: param.name in used_variables, node.params))
-            return node.copy(update={"params": used_params})
-
-
-   3.2: Dead Node Removal
-   --------------------------
-
-   3.3: Propogate Shapes and Types
-   ------------------------------------
-
-
-   3.3: Function Inlining
-   ------------------------------------
-
-   3.4: Vertical Loop Merging
-   ------------------------------------
-
-   3.5: Horizontal Execution Loop Merging
-   ----------------------------------------------
-
-   3.6: Cache Optimization
-   ------------------------------------
-
-   3.8: Pruning
-   ------------------------------------
-
-
-4: Code Generators
-----------------------------------------------------
diff --git a/docs/sphinx_doc/docker.rst b/docs/sphinx_doc/docker.rst
deleted file mode 100644
index a355653e..00000000
--- a/docs/sphinx_doc/docker.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-.. highlight:: shell
-
-======
-Docker
-======
-
-While it is possible to install and build pace bare-metal, we can ensure all system libraries are installed with the correct versions by using a Docker container to test and develop pace.
-This requires that Docker is installed (we recommend `Docker Desktop`_ for most users).
-You may need to increase memory allocated to Docker in its settings.
-
-Before building the Docker image, you will need to update the git submodules so that any dependencies are cloned and at the correct version:
-
-.. code-block:: console
-
-    $ git submodule update --init --recursive
-
-Then build the `pace` docker image at the top level:
-
-.. code-block:: console
-
-    $ make build
-
-.. _`Docker Desktop`: https://www.docker.com/
diff --git a/docs/sphinx_doc/fortran_porting.rst b/docs/sphinx_doc/fortran_porting.rst
deleted file mode 100644
index f4335131..00000000
--- a/docs/sphinx_doc/fortran_porting.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-Fortran Interoperability
-=============
-
-Alongside NDSL there are Fortran based methods that are currently leveraged by the physics and dynamics packages from which GEOS, pace, pySHiELD, and pyFV3 are ported, that handle aspects such as domain generation and data communication.
-
-Packages are currently in development to introduce interfaces which will enable the use of these methods within a Python environment.
-
-One of the ways this is possible is through the use of the ISO_C_BINDING module in Fortan, enabling Fortran-C interoperability, and the ctypes package in Python.
-
-Fortran-C interoperable objects are compiled into a shared object library, and then access to these objects is possible after loading the library into a Python module via ctypes.
-
-The ctypes package contains methods for converting Python objects into C-like objects for use by the Fortran-C source methods.
-
-The `pyFMS <https://github.com/fmalatino/pyFMS>` package is under development and will contains methods from the `Flexible Modeling System (FMS) <https://github.com/NOAA-GFDL/FMS>`, which are made accesible by the `cFMS <https://github.com/mlee03/cFMS>` C-interface to FMS package, by the methods described above.
-
-The methods included in pyFMS have been selected based on the needs of pace, pySHiELD, and pyFV3, but is designed to be independent of these packages.
diff --git a/docs/sphinx_doc/gt4py.rst b/docs/sphinx_doc/gt4py.rst
deleted file mode 100644
index 2d4adc09..00000000
--- a/docs/sphinx_doc/gt4py.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-`Gt4Py <https://gridtools.github.io/gt4py/latest/index.html>``
-==========
-
-The pySHiELD package includes the Python implementation of GFS physics built using the GT4Py domain-specific language.
-Currently, only GFDL cloud microphysics is integrated into Pace.
-Additional physics schemes (NOAA land surface, GFS sea ice, scale-aware mass-flux shallow convection, hybrid eddy-diffusivity mass-flux PBL and free atmospheric turbulence, and rapid radiative transfer model) have been ported indendepently and are available in the `physics-standalone`_ repository.
-Additional work is required to integrate these schemes.
diff --git a/docs/sphinx_doc/index.rst b/docs/sphinx_doc/index.rst
deleted file mode 100644
index 8cc27378..00000000
--- a/docs/sphinx_doc/index.rst
+++ /dev/null
@@ -1,27 +0,0 @@
-NDSL Documentation
-==================
-
-NDSL is domain-specific language for scientific computing in Python, it supports most of the Python language semantics, with performance compartible as the native C and C++ programming language.
-
-
-NDSL has been used as the backend of the Pace model (https://github.com/NOAA-GFDL/pace), which can be run on a laptop using Python-based backend, and on thousands of heterogeneous compute nodes of a large supercomputer using C/C++ and Cuda/HiP backend.
-
-.. toctree::
-   :maxdepth: 2
-   :caption: Contents:
-
-   overview
-   users_guide
-   developer_guide
-   test
-   dace
-   gt4py
-   fortran_porting
-   docker
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
diff --git a/docs/sphinx_doc/make.bat b/docs/sphinx_doc/make.bat
deleted file mode 100644
index 9a939bdb..00000000
--- a/docs/sphinx_doc/make.bat
+++ /dev/null
@@ -1,281 +0,0 @@
-@ECHO OFF
-
-REM Command file for Sphinx documentation
-
-if "%SPHINXBUILD%" == "" (
-	set SPHINXBUILD=sphinx-build
-)
-set BUILDDIR=_build
-set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
-set I18NSPHINXOPTS=%SPHINXOPTS% .
-if NOT "%PAPER%" == "" (
-	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
-	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
-)
-
-if "%1" == "" goto help
-
-if "%1" == "help" (
-	:help
-	echo.Please use `make ^<target^>` where ^<target^> is one of
-	echo.  html       to make standalone HTML files
-	echo.  dirhtml    to make HTML files named index.html in directories
-	echo.  singlehtml to make a single large HTML file
-	echo.  pickle     to make pickle files
-	echo.  json       to make JSON files
-	echo.  htmlhelp   to make HTML files and a HTML help project
-	echo.  qthelp     to make HTML files and a qthelp project
-	echo.  devhelp    to make HTML files and a Devhelp project
-	echo.  epub       to make an epub
-	echo.  epub3      to make an epub3
-	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
-	echo.  text       to make text files
-	echo.  man        to make manual pages
-	echo.  texinfo    to make Texinfo files
-	echo.  gettext    to make PO message catalogs
-	echo.  changes    to make an overview over all changed/added/deprecated items
-	echo.  xml        to make Docutils-native XML files
-	echo.  pseudoxml  to make pseudoxml-XML files for display purposes
-	echo.  linkcheck  to check all external links for integrity
-	echo.  doctest    to run all doctests embedded in the documentation if enabled
-	echo.  coverage   to run coverage check of the documentation if enabled
-	echo.  dummy      to check syntax errors of document sources
-	goto end
-)
-
-if "%1" == "clean" (
-	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
-	del /q /s %BUILDDIR%\*
-	goto end
-)
-
-
-REM Check if sphinx-build is available and fallback to Python version if any
-%SPHINXBUILD% 1>NUL 2>NUL
-if errorlevel 9009 goto sphinx_python
-goto sphinx_ok
-
-:sphinx_python
-
-set SPHINXBUILD=python -m sphinx.__init__
-%SPHINXBUILD% 2> nul
-if errorlevel 9009 (
-	echo.
-	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
-	echo.installed, then set the SPHINXBUILD environment variable to point
-	echo.to the full path of the 'sphinx-build' executable. Alternatively you
-	echo.may add the Sphinx directory to PATH.
-	echo.
-	echo.If you don't have Sphinx installed, grab it from
-	echo.http://sphinx-doc.org/
-	exit /b 1
-)
-
-:sphinx_ok
-
-
-if "%1" == "html" (
-	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
-	goto end
-)
-
-if "%1" == "dirhtml" (
-	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
-	goto end
-)
-
-if "%1" == "singlehtml" (
-	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
-	goto end
-)
-
-if "%1" == "pickle" (
-	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can process the pickle files.
-	goto end
-)
-
-if "%1" == "json" (
-	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can process the JSON files.
-	goto end
-)
-
-if "%1" == "htmlhelp" (
-	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can run HTML Help Workshop with the ^
-.hhp project file in %BUILDDIR%/htmlhelp.
-	goto end
-)
-
-if "%1" == "qthelp" (
-	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; now you can run "qcollectiongenerator" with the ^
-.qhcp project file in %BUILDDIR%/qthelp, like this:
-	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\ERF.qhcp
-	echo.To view the help file:
-	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\ERF.ghc
-	goto end
-)
-
-if "%1" == "devhelp" (
-	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished.
-	goto end
-)
-
-if "%1" == "epub" (
-	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The epub file is in %BUILDDIR%/epub.
-	goto end
-)
-
-if "%1" == "epub3" (
-	%SPHINXBUILD% -b epub3 %ALLSPHINXOPTS% %BUILDDIR%/epub3
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The epub3 file is in %BUILDDIR%/epub3.
-	goto end
-)
-
-if "%1" == "latex" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "latexpdf" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	cd %BUILDDIR%/latex
-	make all-pdf
-	cd %~dp0
-	echo.
-	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "latexpdfja" (
-	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
-	cd %BUILDDIR%/latex
-	make all-pdf-ja
-	cd %~dp0
-	echo.
-	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
-	goto end
-)
-
-if "%1" == "text" (
-	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The text files are in %BUILDDIR%/text.
-	goto end
-)
-
-if "%1" == "man" (
-	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The manual pages are in %BUILDDIR%/man.
-	goto end
-)
-
-if "%1" == "texinfo" (
-	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
-	goto end
-)
-
-if "%1" == "gettext" (
-	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
-	goto end
-)
-
-if "%1" == "changes" (
-	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.The overview file is in %BUILDDIR%/changes.
-	goto end
-)
-
-if "%1" == "linkcheck" (
-	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Link check complete; look for any errors in the above output ^
-or in %BUILDDIR%/linkcheck/output.txt.
-	goto end
-)
-
-if "%1" == "doctest" (
-	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Testing of doctests in the sources finished, look at the ^
-results in %BUILDDIR%/doctest/output.txt.
-	goto end
-)
-
-if "%1" == "coverage" (
-	%SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Testing of coverage in the sources finished, look at the ^
-results in %BUILDDIR%/coverage/python.txt.
-	goto end
-)
-
-if "%1" == "xml" (
-	%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The XML files are in %BUILDDIR%/xml.
-	goto end
-)
-
-if "%1" == "pseudoxml" (
-	%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
-	goto end
-)
-
-if "%1" == "dummy" (
-	%SPHINXBUILD% -b dummy %ALLSPHINXOPTS% %BUILDDIR%/dummy
-	if errorlevel 1 exit /b 1
-	echo.
-	echo.Build finished. Dummy builder generates no files.
-	goto end
-)
-
-:end
diff --git a/docs/sphinx_doc/test.rst b/docs/sphinx_doc/test.rst
deleted file mode 100644
index 0900ebf6..00000000
--- a/docs/sphinx_doc/test.rst
+++ /dev/null
@@ -1,65 +0,0 @@
-=======
-Testing
-=======
-
-Savepoint tests run automatically on every commit to the main branch.
-Savepoint data are generated from `fv3gfs-fortran`_ and can also be downloaded:
-
-.. code-block:: console
-
-    $ make get_test_data
-    $ # if you do not have access to the Google Cloud Storage bucket, use FTP:
-    $ make USE_FTP=yes get_test_data
-
-Savepoint data are used in the "translate" tests and in checkpointer tests.
-Developers should be aware that the "translate" tests are an older, initial design of the test infrastructure which has grown organically and may be difficult to understand or modify, but currently covers smaller parts of the code not tested independently by the checkpointer tests.
-In the long run we suggest increasing the number of checkpoints and adding new checkpointer tests, eventually removing the translate tests, which are considered deprecated.
-
-#. Individual translate tests
-
-    These test at the module level such as `c_sw` and `d_sw`, and the translate logic is shared among dynamical core and physics.
-    Larger tests also exist such as `translate_fvdynamics` which tests a full acoustic time step.
-    Manual thresholds are set for each savepoint test. Curerntly, maximum threshold is applied to all variables within the test.
-    Additionally, a near-zero value can be specified for a variable to ignore values that are very close to zero.
-
-#. Checkpointer tests
-
-    These test the full model run where checkpoints are inserted throughout the model.
-    See ``tests/savepoint/test_checkpoints.py`` for an example.
-    Checkpointers are given model state along with a label, and may implement any behavior they wish.
-    For example, checkpointers have been written to:
-
-    #. compare the model state to a reference state (:py:class:`pace.util.ValidationCheckpointer`)
-    #. calibrate the threshold for each variable given a perturbed state (:py:class:`pace.util.ThresholdCalibrationCheckpointer`)
-
-    Additional checkpoint behaviors could be implemented, for example to save reference test data directly from Python.
-    Thresholds are set automatically using a :py:class:`pace.util.ThresholdCalibrationCheckpointer` for each variable based on a round-off error perturbed initial state.
-    We run the model multiple times with a perturbed initial state and record the largest differences at each checkpoint for each variable.
-    The threshold is then set to the largest difference multiplied by a scaling factor.
-    Currently, only checkpoint tests within the dynamical core are tested.
-    There are two outstanding PRs to include driver and physics checkpoint tests.
-
------------
-Limitations
------------
-While individual translate tests can be run on all backends, checkpointer tests do not work for the orchestrated DaCe backend.
-This is a limitation due to DaCe not accepting keyword arguments or a list of :py:class:`pace.util.Quantity`, causing the checkpointer calls to be overly complicated.
-A possible workaround is to follow the HaloUpdater example to wrap the variables at init time and called during DaCe callbacks.
-A better solution would be to have DaCe accept a list of :py:class:`pace.util.Quantity`.
-
---------
-Examples
---------
-Translate tests for the dynamical core can be run as follows:
-
-.. code-block:: console
-
-    $ make savepoint_tests
-
-We suggest reading the Makefile for a full list of translate test targets. Checkpointer tests can be run as follows:
-
-.. code-block:: console
-
-    $ make test_savepoint
-
-.. _`fv3gfs-fortran`: https://github.com/ai2cm/fv3gfs-fortran/tree/master/tests/serialized_test_data_generation
diff --git a/docs/sphinx_doc/users_guide.rst b/docs/sphinx_doc/users_guide.rst
deleted file mode 100644
index b3070843..00000000
--- a/docs/sphinx_doc/users_guide.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-User Guide
-=============
-
-This page will include general introductory information about NDSL and its, including external links to docs.
diff --git a/docs/user/index.md b/docs/user/index.md
new file mode 100644
index 00000000..292d3953
--- /dev/null
+++ b/docs/user/index.md
@@ -0,0 +1,3 @@
+# Usage documentation
+
+This part of the documentation is geared towards users of NDSL.
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 00000000..09916f21
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,53 @@
+site_name: NDSL Documentation
+
+theme:
+  name: material
+  features:
+    - search.suggest
+    - search.highlight
+    - search.share
+
+nav:
+  - Home: index.md
+  - User documentation: user/index.md
+  - Porting:
+    - General Concepts: porting/index.md
+    - Testing Infrastructure: porting/translate/index.md
+  - Under the hood:
+    - Technical Documentation: dev/index.md
+    - DaCe: dev/dace.md
+    - GT4Py: dev/gt4py.md
+
+
+markdown_extensions:
+  # simple glossary file
+  - abbr
+  # support for colored notes / warnings / tips / examples
+  - admonition
+  # support for footnotes
+  - footnotes
+  # support for syntax highlighting
+  - pymdownx.highlight:
+      anchor_linenums: true
+      line_spans: __span
+      pygments_lang_class: true
+  - pymdownx.inlinehilite
+  - pymdownx.snippets:
+      auto_append:
+        # hover tooltips for abbreviations (simple glossary)
+        - docs/includes/glossary.md
+  - pymdownx.superfences:
+      custom_fences:
+      # support for mermaid graphs
+      - name: mermaid
+        class: mermaid
+        format: python/name:pymdownx.superfences.fence_code_format
+  # image inclusion
+
+plugins:
+  # add search box to the header, configuration in theme
+  - search
+
+watch:
+  # reload when the glossary file is updated
+  - docs/includes
diff --git a/setup.py b/setup.py
index 77993f13..850cff2d 100644
--- a/setup.py
+++ b/setup.py
@@ -11,14 +11,17 @@ def local_pkg(name: str, relative_path: str) -> str:
     return path
 
 
-test_requirements = ["pytest", "pytest-subtests", "coverage"]
-develop_requirements = test_requirements + ["pre-commit"]
+docs_requirements = ["mkdocs-material"]
 demos_requirements = ["ipython", "ipykernel"]
+test_requirements = ["pytest", "pytest-subtests", "coverage"]
+
+develop_requirements = test_requirements + docs_requirements + ["pre-commit"]
 
 extras_requires = {
-    "test": test_requirements,
-    "develop": develop_requirements,
     "demos": demos_requirements,
+    "develop": develop_requirements,
+    "docs": docs_requirements,
+    "test": test_requirements,
 }
 
 requirements: List[str] = [