Skip to content

2020.09.09 Meeting Notes

Andrew Gaspar edited this page Sep 9, 2020 · 3 revisions

Agenda

  • Individual/group updates
  • Scaling Study (Galen)
  • MeshPack <-> MeshBlockPack naming
  • New Release (Andrew)
  • Status on restarts?

Individual/group Updates

LANL CS

Working on integration of Parthenon with LANL's eulerian code. Data structures are mapped across and tasking is now working. The 0D estimate PI example is about ready to run.

Joshua is on leave, but has some outstanding PRs on CI and header separation.

Sriram - restart is working largely in parallel, but there is divergence around derefinement. Still diagnosing. Needs to push changes.

LANL Physics

Packing with multiple mesh block sizes are working. Needs some feedback on API. Noticed some divergence on how MeshBlock lists are passed around. Phil agrees that Jonah's approach is the best approach. Concern - streams are attached to blocks, but should really be attached to task collections/block collections.

Will arrange separate call on how to assign streams to block collections.

PKAthena

Got machine files PR merged.

Integrate task collection into advection example.

Got some weak scaling results - dominated by the buffer packing, as expected. Good news - scalability down to small mesh block sizes (e.g. decomposing the same static mesh into smaller blocks) is good with mesh block collections.

Learned about host pinned memory - allows you to avoid GPU buffer allocation. May make since to use host pinned memory for communication buffers since they don't need to live on the GPU long term.

Direct buffer to buffer copies may be possible now that we're collecting blocks together, but there will have to be investigation into whether it helps. e.g. you're not saved from communicating buffers, which is often the most time-consuming part.

Scaling Study

Has done some strong scaling studies with 32^3 blocks up to 4 GPUs, and is submitting some jobs for much larger runs (128 GPUs and more).

Can we start testing with the MeshBlockPack infrastructure?

The interesting thing to test is working with a fixed size mesh and see how varying the block size affects performance.

Without AMR, we could have advection example working pretty soon. It will take weeks to get it working with AMR.

Galen said he'd like to hold off on proper scaling studies with the new mesh block pack until AMR is up and running.

Galen will continue his existing scaling studies with the current code. If the advection example gets ported, then Galen may do some small experiments.

Phil expects perfect strong scaling as node count increases. However, there's a concern about load balancing - recommends that Galen set the variable to control refinement to only do so every 5 or 6 cycles.

With large runs, the MeshBlockTree starts to be a problem because you duplicate it to every rank. May want to look into the future

MeshPack <-> MeshBlockPack naming

Will rename MeshPack to MeshBlockPack.

New Release

Once Jonah renames MeshPack, then Andrew will cut release 0.2.

Clone this wiki locally