Skip to content

2023.07.13 Meeting Notes

Philipp Grete edited this page Jul 14, 2023 · 4 revisions

Agenda

  • Individual/group updates
  • Load balancing strategies
  • IO design for non-cell centered fields (deferred)
  • Cyl/Sph coordinates in Parthenon (deferred)
  • review non-WIP PRs

Individual/group updates

LR

  • AMR for non-cell-centered fields is ready for review (no IO yet) with 3 PRs
    • Morton indexing
    • Ownership model
    • prolong/restrict in one with new generalized operators
  • Showed movies of OT and MHD rotor with AMR!!! (using potential based formulation)
  • Question for downstream codes: what to do about EMF correction when doing Athena++-style CT
    • machinery in place, but not used/tested yet (ownership model applies, similar to cell centered flux correction)
    • might need to separate out flux correction step from boundary comm
    • other items to be discussed: what should be communicated/corrected (only fine/coarse, but not same-same?)

BP

  • chased down bug (in KHARMA) when creating new containers (should data be copied or not)
  • created PR for PEP1

JM

  • Riot is now on parthenon/develop
  • Riot now entirely based on sparse packs
    • some quality of life improvements should be available upstream shortly
  • will also push custom load balancing upstream
  • Q: what about BiCGStab?
    • may or may not be updated (currently lives on separate branch)
    • LR more interested in pushing for Multigrid rather than
    • BP has backport, will open PR

PM

  • also worked on riot <-> develop
  • bug reported last time (fine/coarse round of error when run on different ranks)
    • problem gone away by changing new comm task
    • not sure why it went away (maybe because of local/nonlocal versus any comm, but that doesn't explain round of error level)
  • will create PR to add CI machinery to cover multiple ranks and pack sizes

FG

BW

  • debugging various Ascent issues
    • slice perp to y axis when running on GPUs
    • ghost zones
  • got couple of open WIP PRs

PG

  • still tracking down IO performance issues on Frontier
  • discovered that our chunking strategy is not optimal
  • working on a best-practice solution and looking for external input (from people with more expert knowledge)
  • Question on extra variables for rst outputs. No objection (though the parameter name should probably differ from the normal outputs)

AJ

  • Results from load balancing work over past months:
  • Can now assign arbitrary blocks to arb ranks
  • Test setup (30k timesteps, 16^3 blocks, 512 ranks, spherical blast with Phoebus, so work per block varies)
  • Implemented different load balancing and comm locality policies
    • contiguous (good locality, poor load balance, given vary. work per block)
    • longest processing time (good load balance, poor locality)
    • contiguous-improved (dynamic programming to eval balance)
    • contiguous-improved-iterative (iteratively improve on prev solution)
  • Currently, load imbalance per block is ~30-40%. Gains should be much higher with larger imbalance.
  • Will look at comparing to Riot LB (see above)
  • Other interesting outcomes
    • (de)refinements oscillate (and are quite costly), so reducing the number of derefinements improves performance
    • For given setup, compute load even per block evolves with time, which is not naturally captured by standard LB. Enforcing LB helps in reducing runtime.
  • next
    • additional input decks
    • GPU vs CPU

next meeting 27

Clone this wiki locally