2020.08.12 Meeting Notes

Agenda

Individual/group updates
Small MeshBlock performance/Results from NVidia meetings/MeshBlockPack
CMake machine specific (auto) configuration (Philipp)
Quick run-through of outstanding non-WIP pull request reviews

Group Updates

LANL CS

Andrew has been working on integrating Parthenon into our integrated physics code.

Joshua B. has CI ready to go - just needs somebody to sign off. Python detection fixes. Helped with getting install working with Ben Ryan.

LANL Physics

Jonah has been looking at elliptic solves and spending some time thinking about kernel parallelization over blocks. Took a look at the multigrid solver in Athena, but not sure how to integrate with Parthenon.

Joshua D. working on coupling in stiff reaction terms.

AthenaPK

Found quite a few bugs while setting up Summit. Josh addressed a lot of the bugs he ran into.

Moved the parallelization strategy selection into the MeshBlock. Running into bugs with Kokkos in the streams and threads PR.

Forrest demonstrating buffer packing uniform grids in a single kernel launch. NVidia thinks it doesn't matter how fast the kernel is as long as the kernel is packed.

Discussion

Small MeshBlock performance/Results from NVidia meetings/MeshBlockPack

The big takeaway is that streams are not sufficient to get the GPU utilization we need. CUDA kernel launching is a very serialized, blocking operation - more threads with multiple streams doesn't work around this issue.

"GAMER" Approach

Takes the "MeshBlock pack" approach with 8^3 grids. They're taking an approach where everything lives on the host, and things are offloaded to the GPU for specific, expensive operations. 8^3 can fit in the GPU cache.

Block Packing

Jonah implemented Block Packing: https://github.com/lanl/parthenon/pull/263

Essentially the same as variable packing, but across multiple (currently all) blocks, too. General agreement that this loop pattern is palatable and we can get behind it.

Joshua D. has been thinking about what does this look like for task lists.

Some discussion on the performance implication of constructing these packs, whether we need to cache them, how often that cache would be invalidated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly