Cherrypick ompoffload devturbo take3#5
Conversation
* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call
To enable this, had to be removed - otherwise segfaults happen on the GPU.
This reverts commit 0cc2a77. Having both the CPU and GPU OpenMP directives compiled caused a significant slowdown in GPU packing/unpacking performance - even if parallelism is controlled using OpenMP "if" clause.
Some very minor changes to the OpenMP target MPI PR: * use_device_ptr -> use_device_addr This appears to be the updated form (or at least nvfortran says it is) * Whitespace added to `!$ use omp_lib` Does not seem crucial but from our previous discussion it appears more correct. * Removal of some trailing whitespace.
This patch refactors several lines to keep within the 121-character line length limit prescribed by the FMS style guidelines.
The no-comm (no MPI) interface has been updated to support the new omp_offload argument.
This ensures that (un)packing steps in do_group_update is performed with openmp cpu parallelism if ompoffload=.false.. Previously it would only do serial. This is implemented by undefining the GPU macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the (un)packing files. To make this work, the default(shared) was used in all the relevant OpenMP directives. If default(none) is used, the loops would hang or segfault.
Removed trailing whitespace, replaced tabs with spaces, and kept all lines at 121 lines or fewer
Missed one of the long lines in cesm_constants.fh
I thought I shortened all the long lines in cesm_constants.fh but my awk command to find the line length was off-by-one
Use a container with gcc 15.1.0
Also, don't define CESM_CONSTANTS macro (will let us drop CESM share code from the turbo-stack submodules, and is bit-for-bit)
I think our version of FMS is much older than what the FMS coupler is expecting, so tests are failing... but we don't want to update our branch to include newer FMS code so we should drop the test instead
I'm trying to make FMS configure look more like TIM configure, and some I removed some stuff from configure.ac that is now breaking Makefiles
This c file is included in the TIM version of Makefile.ac but not the FMS2 version, and may be responsible for failing CI build
test_gather2DV does not exist in TIM or in more recent FMS commits, and it has some code that doesn't compile... I suspect that's related to why the test was removed in subsequent PRs
test_simple_domain.nc does not fit into character(len=20); increased to len=25 just to have some breathing room
This reverts commit b018921.
This reverts commit f88a095.
There was a problem hiding this comment.
@mnlevy1981 thanks for the PR. If this passes the CI, then I think this is great! Have you confirmed or will you confirm that this allows execution on the GPU yet? Maybe that is a question for a later PR when all of the dev/gpu changes have been merged into turbo and turbo-stack 231 has been accepted.
I have not checked this with GPUs yet. It probably makes sense to test this via turbo-stack and report results before merging this PR... which I guess means waiting for derecho to come back up? |
|
Talked to @johnmauff who thinks we should merge this in now and test it all in turbo-stack once TIM and MOM6 are updated as well |
Description
This replaces #4 by backing up to a commit that passed all the CI testing. See that PR for further details / discussion
How Has This Been Tested?
I have run the turbo-stack unit tests and
double_gyrewith this branch, and verified it does not change answers. Note that I have removed the GCC-based autoconf test, but the GCC CMake tests and the Intel autoconf test all pass.Checklist: