-
Notifications
You must be signed in to change notification settings - Fork 37
2023.12.14 Meeting Notes
Philipp Grete edited this page Jan 10, 2024
·
3 revisions
- Individual/group updates
- IO
- review non-WIP PRs
LR
- working on performance improvements for MG
- caching PackDescriptors for boundary packs significantly improved performance (this was a surprising bottleneck)
- for downstream codes: make sure they are
static
(especially for larger numbers of variables) - BP: might this be related to
FlagCollections
(which seem to be slow)
- for downstream codes: make sure they are
BP
- chasing Gremlins in MPI like
- fewer than 3 meshblocks in any direction results in issues (like magnetic field divergence)
- not 1:1 reproducible
- trying to bisect
- sounds like being related to the issues PG is seeing
JM
- small improvements here and there like
- index splits (for easier/faster hierarchical parallelism by vectorization), PR is open
- advantage over TeamMDRange is more flexible control over which indices are fused
- added machinery for correctness check in parthenon-vibe benchmark
- non-cell centered IO
- refactoring Phoebus for more modern Parthenon use
PD
- working on new Parthenon based code with curvilinear coordinates
- more Athena++-esque
- user prolongation/restriction/flux correction etc worked well
- has a use case for "just a flux"
- face fields provide too much boilerplate (and implicitly enable additional machinery)
- might be fixed with a small new metadata flag
- JM/LR: might be worth to refactor the flux correction machinery down the line (e.g., make fluxes face variables with dependencies)
- looks like our implicit use of ghost cells for flux fields is not necessarily intuitive
- needs to be documented
- there might be a gotcha when using face and edge centered data in the same loop as cell centered field in the current WIP IndexSplit machinery
PG
- encountered MPI issues (timeouts) with few blocks per rank (say one or two) with recent version of Parthenon
- is somewhat reliably reproducible
- unclear where this stems from, but not more detailed debugging yet
- still tracing IO issues with HDF5, which now became a roadblock for simulations coming up
- ADIOS2 seems to be very performant (on Orion). Quick testing allowed to write 4.5T of data in 0.85 seconds (from 512 nodes)
- will look into new output based on openPMD with ADIOS2 backend
- worked on large scale viz of INCITE sims
- needed some custom xdmf pre-processing to reduce data that could be handled by ParaView
- see above (openPMD/ADIOS2)
- we should ensure we can ship it as submodule (for ease of use)
- need also to ensure that analysis pipelines (especially python based) easily interface with those outputs
-> people should think about ideas/approaches for a Gordon Bell submission
- unify/pick packing machinery <-> id based packing
- best practices on performance relevant parameters (and more generally block sizes/work per device)