Development setup and early work for a Schedule Tree Optimizer Pipeline#189
Conversation
This branch is what we use for the NASA team as we start to prepare for Milestone 2. Currently uses the following versions of externals: - GT4Py: follows "milestone2" branch on Roman's fork - DaCe: whatever GT4Py's uv.lock file says about dace-cartesian
This commit updates GT4Py to add support for the experimental features "absolute k indexing" and "expose current k-level" in the debug backend.
* Roundtrip sdfg -> stree -> sdfg in orchestration with moaar validation * Remove debug prints and intermediate sdfg saving * Use default when calling simplify * Update gt4py/dace submodules (roundtrip work) This commit brings the changes needed for stree rountrips to validate with the AI2 data (in the PyFV3 translate tests). * Update README * Quick note: skip ScalarToSymbolPromotion for now The pass messes up previously valid & validating SDFGs. We can live (performance wise) without it for the current milestone. Let's re-evaluate once we get back to DaCe mainline (v2). * Update gt4py & dace submodules (stree/rountrip)
… MPI inplace all_reduce calls. Note that device_synchronize is Cupy/CUDA specific at the moment.
Add support for fields with data_dims (or data_dims only fields) in the stencil wrapper's function to computae field origins.
Add a non-trival test case for orchestrating tables. This is a mitigation for a gt4py-orchestration-issue that is easiest reproduced from NDSL (compared to a adding a test in gt4py directly).
Cherry-picking (parts of) PR #211 into the milestone2 branch.
This gt4py update includes - tests for the memlet dimension fix - another fix to ensure that we always define all three cartesian symbols (even if we are only passing 2d fields and scalars into the stencil).
FlorianDeconinck
left a comment
There was a problem hiding this comment.
I think we are ready for show
romanc
left a comment
There was a problem hiding this comment.
This looks great! Really just nitpicks inline. Feel free to discard if you don't feel like.
Regarding (in code) question of how to test all of this: I think we'll need to build tooling for this. Either by exposing more things, e.g.
code = MyOrchestratedCode(stencil_factory)
code._debug.sdfg # access to the (last) sdfg
code._debug.strees # list of strees in the sdfgor by tooling the test system to abstract away the ugliness of "running orchestration by hand". Of course both approaches bring a bunch of questions
If we push the question for a couple month, we might be able to leverage a potential test system that DaCe would expose for its users. You know, like we expose the translate test system from NDSL (just not so ugly). We might need to write such a "DaCe test system for stree transformations" ourselves, but it might be easier to test at that level.
…ree.treenodes` Better docs
Ante Scriptum: this branch was the workhorse for NASA's M2 work. All features were individually merged, except this one.
Description
This branch brings the setup to run the Schedule Tree pipeline (SDFG -> STREE -> SDFG) and lays the foundation for a pipeline.
A first, partially working,
AxisMergeoptimization pass is available.All of this code is for backend developer only and is deactivated by default. #HereBeDragons
How Has This Been Tested?
Basic test have been implemented covering the easiest
Checklist: