-
Notifications
You must be signed in to change notification settings - Fork 129
Single node performance
Typical performance figures for Athena++ are presented in Tables 1 and 2 for serial single-core and MPI-parallelized multicore results on several target architectures. The default second-order temporally-accurate predictor-corrector scheme (time/integrator=vl2
) was used in all cases, but the reconstruction method of the corrector step was set to either PLM (time/xorder=2
) or PPM (time/xorder=3
). Various Riemann solvers were tested.
These figures can be considered as the benchmark values for the basic hydrodynamics and magnetohydrodynamics solver capabilities of the code; optional functionality such as non-Cartesian and/or nonuniform Coordinate Systems and Meshes, Special Relativity, General Relativity, Shearing Box, Self Gravity with FFT, etc. should all be measured relative to the below values.
MZone-cycles/sec | |||||
---|---|---|---|---|---|
Xeon Phi KNL 7210 |
Broadwell E5-2680 v4 |
Skylake-SP Gold 6148 |
|||
Hydro Sod | PLM | HLLC | 1.533 | 2.730 | 4.503 |
HLLE | 1.618 | 2.868 | 4.880 | ||
Roe | 1.555 | 2.872 | 4.654 | ||
PPM | HLLC | 0.752 | 1.336 | 2.411 | |
HLLE | 0.762 | 1.365 | 2.528 | ||
Roe | 0.762 | 1.361 | 2.424 | ||
MHD Brio-Wu | PLM | HLLD | 0.705 | 1.340 | 2.403 |
HLLE | 0.803 | 1.406 | 2.307 | ||
Roe | 0.649 | 1.143 | 1.921 | ||
PPM | HLLD | 0.392 | 0.719 | 1.291 | |
HLLE | 0.419 | 0.749 | 1.259 | ||
Roe | 0.373 | 0.666 | 1.119 |
MZone-cycles/sec | |||||
---|---|---|---|---|---|
Xeon Phi KNL 7210 |
(2x) Broadwell E5-2680 v4 |
(2x) Skylake-SP Gold 6148 |
|||
Hydro Sod | PLM | HLLC | 66.908 | 29.444 | 40.750 |
HLLE | 67.405 | 29.440 | 40.764 | ||
Roe | 67.094 | 29.418 | 40.820 | ||
PPM | HLLC | 40.279 | 20.269 | 32.248 | |
HLLE | 40.182 | 20.328 | 32.403 | ||
Roe | 40.196 | 20.336 | 32.234 | ||
MHD Brio-Wu | PLM | HLLD | 30.886 | 16.244 | 22.711 |
HLLE | 32.526 | 16.483 | 22.757 | ||
Roe | 29.145 | 15.140 | 22.673 | ||
PPM | HLLD | 19.378 | 11.123 | 17.733 | |
HLLE | 20.430 | 11.313 | 17.684 | ||
Roe | 18.972 | 10.623 | 17.495 |
Notes on methodology:
- Both benchmark problems are 3D shock tube problems using the adiabatic equation of state.
- Each table entry represents the mean of 20 trials of independent, exclusive compute node Slurm allocations on clusters managed by Princeton Research Computing
- The solver was configured with
--nghost=2
for all PLM tests and--nghost=4
for all PPM tests. - The Intel C++ Compiler version 18.0.3 was used to generate all of these results. The only compiler flags used are those defined by the latest version's
--cxx=icc
[Configuring] option.- Similarly, the 2018 Revision 3 Intel MPI library was the only MPI library used for the multicore study
- Table 2 uses the same problem size per-core as the single-core tests in Table 1. Flat MPI is used to parallelize the problem with 1 rank assigned per physical core.
- However, the multicore tests on the KNL were the only set to use hybrid OpenMP+MPI parallelization. Assigning 4 OpenMP threads (each assigned a 64x32x32 MeshBlock) per MPI rank achieved high performance utilizing the 4-way hyperthreading of the 64x physical cores (256 logical cores) on these nodes. See the discussion in Using MPI and OpenMP.
KNL-specific details:
- Flat memory mode. Cache memory mode was simulated by prepending the binary call with
numactl -p 1 ./athena ...
to prefer that Athena++ used the ~16 GB of MCDRAM. - Quadrant clustering mode.
Under construction.
Getting Started
User Guide
- Configuring
- Compiling
- The Input File
- Problem Generators
- Boundary Conditions
- Coordinate Systems and Meshes
- Running the Code
- Outputs
- Using MPI and OpenMP
- Static Mesh Refinement
- Adaptive Mesh Refinement
- Load Balancing
- Special Relativity
- General Relativity
- Passive Scalars
- Shearing Box
- Diffusion Processes
- General Equation of State
- FFT
- Multigrid
- High-Order Methods
- Super-Time-Stepping
- Orbital Advection
- Rotating System
- Reading Data from External Files
- Non-relativistic Radiation Transport
- Cosmic Ray Transport
- Units and Constants
Programmer Guide