Skip to content

Cherrypick ompoffload devturbo take3#5

Merged
mnlevy1981 merged 28 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:cherrypick_ompoffload_devturbo_take3
May 12, 2026
Merged

Cherrypick ompoffload devturbo take3#5
mnlevy1981 merged 28 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:cherrypick_ompoffload_devturbo_take3

Conversation

@mnlevy1981
Copy link
Copy Markdown

Description
This replaces #4 by backing up to a commit that passed all the CI testing. See that PR for further details / discussion

How Has This Been Tested?
I have run the turbo-stack unit tests and double_gyre with this branch, and verified it does not change answers. Note that I have removed the GCC-based autoconf test, but the GCC CMake tests and the Intel autoconf test all pass.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

edoyango and others added 28 commits May 6, 2026 12:11
* add multi gpu support

* address review comments, add helpful comment for the acc/mp runbtime call
To enable this,  had to be removed -
otherwise segfaults happen on the GPU.
This reverts commit 0cc2a77.
Having both the CPU and GPU OpenMP directives compiled caused
a significant slowdown in GPU packing/unpacking performance -
even if parallelism is controlled using OpenMP "if" clause.
Some very minor changes to the OpenMP target MPI PR:

* use_device_ptr -> use_device_addr

  This appears to be the updated form (or at least nvfortran says it is)

* Whitespace added to `!$ use omp_lib`

  Does not seem crucial but from our previous discussion it appears more
  correct.

* Removal of some trailing whitespace.
This patch refactors several lines to keep within the 121-character line
length limit prescribed by the FMS style guidelines.
The no-comm (no MPI) interface has been updated to support the new
omp_offload argument.
This ensures that (un)packing steps in do_group_update is performed
with openmp cpu parallelism if ompoffload=.false.. Previously it
would only do serial. This is implemented by undefining the GPU
macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the
(un)packing files.

To make this work, the default(shared) was used in all the relevant
OpenMP directives. If default(none) is used, the loops would hang
or segfault.
Removed trailing whitespace, replaced tabs with spaces, and kept all lines at
121 lines or fewer
Missed one of the long lines in cesm_constants.fh
I thought I shortened all the long lines in cesm_constants.fh but my awk
command to find the line length was off-by-one
Use a container with gcc 15.1.0
Also, don't define CESM_CONSTANTS macro (will let us drop CESM share code from
the turbo-stack submodules, and is bit-for-bit)
I think our version of FMS is much older than what the FMS coupler is
expecting, so tests are failing... but we don't want to update our branch to
include newer FMS code so we should drop the test instead
I'm trying to make FMS configure look more like TIM configure, and some I
removed some stuff from configure.ac that is now breaking Makefiles
This c file is included in the TIM version of Makefile.ac but not the FMS2
version, and may be responsible for failing CI build
test_gather2DV does not exist in TIM or in more recent FMS commits, and it has
some code that doesn't compile... I suspect that's related to why the test was
removed in subsequent PRs
test_simple_domain.nc does not fit into character(len=20); increased to len=25
just to have some breathing room
@mnlevy1981 mnlevy1981 requested a review from johnmauff May 11, 2026 21:39
Copy link
Copy Markdown

@johnmauff johnmauff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnlevy1981 thanks for the PR. If this passes the CI, then I think this is great! Have you confirmed or will you confirm that this allows execution on the GPU yet? Maybe that is a question for a later PR when all of the dev/gpu changes have been merged into turbo and turbo-stack 231 has been accepted.

@mnlevy1981
Copy link
Copy Markdown
Author

Have you confirmed or will you confirm that this allows execution on the GPU yet? Maybe that is a question for a later PR when all of the dev/gpu changes have been merged into turbo and turbo-stack 231 has been accepted.

I have not checked this with GPUs yet. It probably makes sense to test this via turbo-stack and report results before merging this PR... which I guess means waiting for derecho to come back up?

@mnlevy1981 mnlevy1981 merged commit 7e52668 into TURBO-ESM:dev/turbo May 12, 2026
12 checks passed
@mnlevy1981
Copy link
Copy Markdown
Author

Talked to @johnmauff who thinks we should merge this in now and test it all in turbo-stack once TIM and MOM6 are updated as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants