-
Notifications
You must be signed in to change notification settings - Fork 37
2020.08.12 Meeting Notes
-
Individual/group updates
-
Small MeshBlock performance/Results from NVidia meetings/MeshBlockPack
-
CMake machine specific (auto) configuration (Philipp)
-
Quick run-through of outstanding non-WIP pull request reviews
Andrew has been working on integrating Parthenon into our integrated physics code.
Joshua B. has CI ready to go - just needs somebody to sign off. Python detection fixes. Helped with getting install working with Ben Ryan.
Jonah has been looking at elliptic solves and spending some time thinking about kernel parallelization over blocks. Took a look at the multigrid solver in Athena, but not sure how to integrate with Parthenon.
Joshua D. working on coupling in stiff reaction terms.
Found quite a few bugs while setting up Summit. Josh addressed a lot of the bugs he ran into.
Moved the parallelization strategy selection into the MeshBlock. Running into bugs with Kokkos in the streams and threads PR.
Forrest demonstrating buffer packing uniform grids in a single kernel launch. NVidia thinks it doesn't matter how fast the kernel is as long as the kernel is packed.
The big takeaway is that streams are not sufficient to get the GPU utilization we need. CUDA kernel launching is a very serialized, blocking operation - more threads with multiple streams doesn't work around this issue.
Takes the "MeshBlock pack" approach with 8^3 grids. They're taking an approach where everything lives on the host, and things are offloaded to the GPU for specific, expensive operations. 8^3 can fit in the GPU cache.
Jonah implemented Block Packing: https://github.com/lanl/parthenon/pull/263
Essentially the same as variable packing, but across multiple (currently all) blocks, too. General agreement that this loop pattern is palatable and we can get behind it.
Joshua D. has been thinking about what does this look like for task lists.
Some discussion on the performance implication of constructing these packs, whether we need to cache them, how often that cache would be invalidated.