-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting things running on Chicoma #70
Comments
Chicoma appears to be extremely finicky. If you deviate from this recipe at all, you may experience problems. For example, |
Update: It works! But the instructions are very different from before. In particular, |
From Shengtai with AthenaPK: module load cpe-cuda cuda cmake cray-hdf5-parallel
setenv CRAY_ACCEL_TARGET nvidia80
setenv MPICH_GPU_SUPPORT_ENABLED 1
cmake -S. -Bbuild-gpu -DCMAKE_CXX_COMPILER=CC
|
My experimentation confirms that the OFI transport layer works now. It is much faster than UCX. Main post updated. |
Since these instructions are being used by other parthenon-based codes, please see this Parthenon issue: I have added the relevant Kokkos config line to the Pheobus |
It seems that module swap PrgEnv-cray PrgEnv-gnu I just do module load PrgEnv-gnu |
Thanks @brryan instructions updated. |
If :set backspace=indent,eol,start |
The Chicoma environment has changed. Notably, cuda-aware MPI has been removed/disabled. I am updating the top-level comment. But I include the old procedure below for posterity. Coding environmentWe need to enable a programming environment. There's an Nvidia environment, but it seems broken, and is missing cuda headers. I used the GNU backend: module load PrgEnv-gnu if the cray environment is already loaded, you may need to do module swap PrgEnv-cray PrgEnv-gnu I also needed to load module load cpe-cuda and then we can load relevant modules: module load cuda cray-hdf5-parallel cmake Note that I put MPIThere are two transport layers available on Chicoma: UCX and OFI. UCX is ethernet, OFI is high-speed fiber, so the latter is recommended. (I notice a significant performance degredation with UCX.) For debugging, however, it's useful to include both, as OFI appears to be more finicky. Regardless, set export CRAY_ACCEL_TARGET=nvidia80 to enable cuda-aware MPI. OFIYou must set the following environment variable for OFI export MPICH_GPU_SUPPORT_ENABLED=1 UCXTo swap to the UCX transport layer: module swap craype-network-ofi craype-network-ucx
module swap cray-mpich cray-mpich-ucx Te enable CUDA-aware MPI we additionally need to set the following environment variable export UCX_TLS="self,ud,cuda,sm" CompilingMake a build directory mkdir -p build && cd build Here's the line required to build the torus problem cmake -DPHOEBUS_ENABLE_CUDA=ON -DPHOEBUS_GEOMETRY=FMKS -DPHOEBUS_CACHE_GEOMETRY=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DCMAKE_CXX_COMPILER=CC ..
make -j Note the RunningA 2d sim on one GPU works as expected: ./src/phoebus -i ../inputs/torus.pin When using MPI, use srun -p gpu -C gpu80 --time=0:30:00 --nodes 2 --ntasks 8 --ntasks-per-node 4 phoebus -i torus.pin |
Need to update instructions again. Currently the top-level instructions require PR #177 . However, it should be merged soon. old instructions saved below for posterity. I will use this issue to document the steps I've taken to get things running on Getting source codeFirst, download the code from git. Phoebus relies heavily on submodules, and so you must use a recursive clone: git clone --recursive [email protected]:lanl/phoebus.git To transfer the code to chicoma, I found the easiest thing was to clone recursively to my desktop and then rsync --progress --exclude '.git' --exclude 'build' -rLpt phoebus -e 'ssh ${USER}@wtrw ssh' ch-fe:~ (Note: do not exclude We need to do this because the whole machine is behind a gateway server and you can't access the web. Getting a nodeNote for compilation to work, you must be on a backend node. I could not get a frontend node to build. Since Chicoma is a hybrid machine, request the gpu partition with Full command: salloc -p gpu -C gpu80 --time=8:00:00 To get a debug node (only the 40GB GPUs available), use: salloc --qos=debug --reservation=gpu_debug --partition=gpu_debug --time=2:00:00 Note the debug nodes are available only for 2h at a time. Coding environmentWe need to enable a programming environment. There are two approaches one can follow. The NVHPC PathCredit to @bprather for finding this path. module purge
module load PrgEnv-nvhpc
export CRAY_CPU_TARGET="x86-64" The GNU + Cuda approachThis approach is closest to the old workflow and worked for me: module swap PrgEnv-cray PrgEnv-gnu I also needed to load module load cpe-cuda and then finally we load cuda: module load cuda MPICuda-aware MPI has been disabled on Chicoma for some reason. The only path currently is to use CPU-side MPI with host-pinned memory. This can be enabled by adding the -DPARTHENON_ENABLE_HOST_COMM_BUFFERS=ON flag to your cmake configuration line. You'll see this in the config line below. No need to manually load MPI. CPU-capable HDF5There is a module load cray-hdf5-parallel to get working parallel HDF5. However, if you used the CC=CC ./configure --prefix=/my/local/install/path --enable-build-mode=production --enable-hl --enable-symbols=yes --enable-parallel
make -j
make install CMakeFinally load the cmake module last: module load cmake This shouldn't matter, but it does. Always load cmake last. Otherwise it does not correctly resolve install paths for loaded modules. CompilingMake a build directory mkdir -p build && cd build Here's the line required to build the torus problem cmake -DPARTHENON_ENABLE_HOST_COMM_BUFFERS=ON -DPHOEBUS_ENABLE_CUDA=ON -DPHOEBUS_GEOMETRY=FMKS -DPHOEBUS_CACHE_GEOMETRY=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DCMAKE_CXX_COMPILER=CC ..
make -j Note a few aspects of this command:
RunningA 2d sim on one GPU works as expected: ./src/phoebus -i ../inputs/torus.pin When using MPI, use srun -p gpu -C gpu80 --time=0:30:00 --nodes 2 --ntasks 8 --ntasks-per-node 4 phoebus -i torus.pin |
I was able to compile on a frontend node by adding the |
Ah, thanks for that @ajdittmann I've updated the instructions. |
I am currently having trouble compiling singularity-eos and singularity-opac with the same HDF5 version as the one found by phoebus---CMake seems to have changed behaviour and does the |
|
MR #177 now in main. Top level instructions should be the ground truth for Chicoma at this time. |
On Chicoma, try: |
We should update this to:
And then |
Old version for archival I will use this issue to document the steps I've taken to get things running on Getting source codeFirst, download the code from git. Phoebus relies heavily on submodules, and so you must use a recursive clone: git clone --recursive [email protected]:lanl/phoebus.git To transfer the code to chicoma, I found the easiest thing was to clone recursively to my desktop and then rsync --progress --exclude '.git' --exclude 'build' -rLpt phoebus -e 'ssh ${USER}@wtrw ssh' ch-fe:~ (Note: do not exclude We need to do this because the whole machine is behind a gateway server and you can't access the web. Getting a nodeSince Chicoma is a hybrid machine, request the gpu partition with Full command: salloc -p gpu -C gpu80 --time=8:00:00 To get a debug node (only the 40GB GPUs available), use: salloc --qos=debug --reservation=gpu_debug --partition=gpu_debug --time=2:00:00 Note the debug nodes are available only for 2h at a time. Coding environmentWe need to enable a programming environment. Currently, the only code path which supports GPUDirect RDMA is the following: The NVHPC PathCredit to @bprather for finding this path. module swap PrgEnv-cray PrgEnv-nvhpc
module load craype-accel-nvidia80
export CRAY_ACCEL_TARGET=nvidia80
export MPICH_GPU_SUPPORT_ENABLED=1
export NVCC_WRAPPER_DEFAULT_COMPILER=CC
export CC=$(which cc) # not sure why these are necessary but they appear to be
export CXX=$(which CC)
export FC=$(which ftn)
export FI_CXI_RX_MATCH_MODE=hybrid
export FI_CXI_RDZV_THRESHOLD=64000 HDF5There is a module load cray-hdf5-parallel At the moment it appears to work with the CMakeFinally load the cmake module last: module load cmake/3.25.1 # version may need to be changed later This shouldn't matter, but it does. Always load cmake last. Otherwise it does not correctly resolve install paths for loaded modules. CompilingMake a build directory mkdir -p build && cd build Here's the line required to build the torus problem cmake -DPHOEBUS_ENABLE_CUDA=ON -DPHOEBUS_GEOMETRY=FMKS -DPHOEBUS_CACHE_GEOMETRY=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DCMAKE_CXX_COMPILER=/path/to/phoebus/scripts/bash/nvcc_wrapper -DHDF5_INCLUDE_DIR=${HDF5_ROOT}/include ..
make -j Note a few aspects of this command:
If you want to build on a frontend node, add the following flag -DKokkos_ARCH_AMPERE80=ON You may also need these flags (YMMV) -DCMAKE_CXX_FLAGS="${PE_MPICH_GTL_DIR_nvidia80} ${PE_MPICH_GTL_LIBS_nvidia80}" which should be implied by the compiler wrappers, but seem not to be appropriately passed through on the frontend. RunningA 2d sim on one GPU works as expected: ./src/phoebus -i ../inputs/torus.pin When using MPI, use srun -p gpu -C gpu80 --time=0:30:00 --nodes 2 --ntasks 8 --ntasks-per-node 4 ~/phoebus/external/parthenon/external/Kokkos/bin/hpcbind -- phoebus -i torus.pin Note the |
I will use this issue to document the steps I've taken to get things running on
Chicoma
. This will be a living document.Getting source code
First, download the code from git. Phoebus relies heavily on submodules, and so you must use a recursive clone:
To transfer the code to chicoma, I found the easiest thing was to clone recursively to my desktop and then
rsync
excluding version control directories:(Note: do not exclude
bin
directories, as the Kokkos bin directory is needed.)We need to do this because the whole machine is behind a gateway server and you can't access the web.
Getting a node
Since Chicoma is a hybrid machine, request the gpu partition with
-p gpu
and request the 80GB A100s with-C gpu80
. (These are flags for slurm.)Full command:
To get a debug node (only the 40GB GPUs available), use:
Note the debug nodes are available only for 2h at a time.
Coding environment
We need to enable a programming environment. Here is one possible path:
This shouldn't matter, but it does. Always load cmake last. Otherwise it does not correctly resolve install paths for loaded modules.
Compiling
Make a build directory
Here's the line required to build the torus problem
If you want to build on a frontend node, add the following flag
Running
A 2d sim on one GPU works as expected:
When using MPI, use
srun
to launch the job.mpirun
does not work as expected. A 3D sim accross 2 nodes might be launched as:srun -p gpu -C gpu80 --time=0:30:00 --nodes 2 --ntasks 8 --ntasks-per-node 4 ~/phoebus/external/parthenon/external/Kokkos/bin/hpcbind -- phoebus -i torus.pin
Note the
hpcbind
call, which prevents MPI ranks or GPUs from migrating, which can cause problems.The text was updated successfully, but these errors were encountered: