Skip to content

Merge ompoffload devturbo#2

Closed
mnlevy1981 wants to merge 402 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:merge_ompoffload_devturbo
Closed

Merge ompoffload devturbo#2
mnlevy1981 wants to merge 402 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:merge_ompoffload_devturbo

Conversation

@mnlevy1981
Copy link
Copy Markdown

Description
Merged the ompoffload branch from @edoyango into dev/turbo. I have verified that it has built (my initial conflict-resolution was bad and it didn't built at first :), and I can run the code in TURBO-ESM/MOM6#20 with this version of FMS.

How Has This Been Tested?
I built an updated version of MOM6 with this FMS branch, the double_gyre example runs and is bit-for-bit with what is currently available on turbo-stack

uramirez8707 and others added 30 commits May 1, 2024 17:46
J-Lentz and others added 22 commits August 21, 2025 13:00
* add multi gpu support

* address review comments, add helpful comment for the acc/mp runbtime call
To enable this,  had to be removed -
otherwise segfaults happen on the GPU.
This reverts commit 0cc2a77.
Having both the CPU and GPU OpenMP directives compiled caused
a significant slowdown in GPU packing/unpacking performance -
even if parallelism is controlled using OpenMP "if" clause.
Some very minor changes to the OpenMP target MPI PR:

* use_device_ptr -> use_device_addr

  This appears to be the updated form (or at least nvfortran says it is)

* Whitespace added to `!$ use omp_lib`

  Does not seem crucial but from our previous discussion it appears more
  correct.

* Removal of some trailing whitespace.
This patch refactors several lines to keep within the 121-character line
length limit prescribed by the FMS style guidelines.
The no-comm (no MPI) interface has been updated to support the new
omp_offload argument.
This ensures that (un)packing steps in do_group_update is performed
with openmp cpu parallelism if ompoffload=.false.. Previously it
would only do serial. This is implemented by undefining the GPU
macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the
(un)packing files.

To make this work, the default(shared) was used in all the relevant
OpenMP directives. If default(none) is used, the loops would hang
or segfault.
@johnmauff
Copy link
Copy Markdown

@mnlevy1981 Wow, that is a lot of changes! Would it be possible to identify changes made by Ed to support the GPU work versus other changes? I started to look at the changed files but quickly became overwhelmed by changes that had nothing to do with GPU enablement.

@johnmauff
Copy link
Copy Markdown

@mnlevy1981 I just looked at all of the commit messages in this PR. Only the last 13 or 14 appear to have been performed by Ed and Jorge. Can we cherry-pick those changes to the FMS2?

@mnlevy1981
Copy link
Copy Markdown
Author

@mnlevy1981 Wow, that is a lot of changes! Would it be possible to identify changes made by Ed to support the GPU work versus other changes? I started to look at the changed files but quickly became overwhelmed by changes that had nothing to do with GPU enablement.

@mnlevy1981 I just looked at all of the commit messages in this PR. Only the last 13 or 14 appear to have been performed by Ed and Jorge. Can we cherry-pick those changes to the FMS2?

We could, or we could merge mom-ocean:main onto dev/turbo (which should be a big set of changes, but might not need in-depth review since the code has already been approved by the entire MOM6 consortium) and the PR to merge dev/gpu should be much more manageable

@mnlevy1981
Copy link
Copy Markdown
Author

ugh, I just noticed what repo I'm in... I guess we'd need to bring FMS up to date with main or whenever the FMS team calls the primary branch, and then merge Ed's ompoffload branch separately. (I thought this was TURBO-ESM/MOM6#20)

@mnlevy1981 mnlevy1981 closed this May 6, 2026
mnlevy1981 added a commit that referenced this pull request May 12, 2026
* add gpu2gpu mpi transer with flag for do_group_update

* add missing collapse(3) clauses

* Use __NVCOMPILER macro for target regions

* add back old omp directive wrapped in #ifndef __NVCOMPILER

* port remaining un/pack loops

* add multi gpu support (#2)

* add multi gpu support

* address review comments, add helpful comment for the acc/mp runbtime call

* sub __NVCOMPILER with __NVCOMPILER_OPENMP_GPU

* allow choice of gpu or cpu parallel

To enable this,  had to be removed -
otherwise segfaults happen on the GPU.

* fix omp set device call

* Revert "allow choice of gpu or cpu parallel"

This reverts commit 0cc2a77.
Having both the CPU and GPU OpenMP directives compiled caused
a significant slowdown in GPU packing/unpacking performance -
even if parallelism is controlled using OpenMP "if" clause.

* OMP MPI: Minor cleanups

Some very minor changes to the OpenMP target MPI PR:

* use_device_ptr -> use_device_addr

  This appears to be the updated form (or at least nvfortran says it is)

* Whitespace added to `!$ use omp_lib`

  Does not seem crucial but from our previous discussion it appears more
  correct.

* Removal of some trailing whitespace.

* OMP target MPI: line length compliance

This patch refactors several lines to keep within the 121-character line
length limit prescribed by the FMS style guidelines.

* OMP MPI: Update nocomm interface

The no-comm (no MPI) interface has been updated to support the new
omp_offload argument.

* use openmp cpu if ompoffload=.false.

This ensures that (un)packing steps in do_group_update is performed
with openmp cpu parallelism if ompoffload=.false.. Previously it
would only do serial. This is implemented by undefining the GPU
macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the
(un)packing files.

To make this work, the default(shared) was used in all the relevant
OpenMP directives. If default(none) is used, the loops would hang
or segfault.

* Linting clean-up

Removed trailing whitespace, replaced tabs with spaces, and kept all lines at
121 lines or fewer

* One more lint clean-up

Missed one of the long lines in cesm_constants.fh

* One more linting commit

I thought I shortened all the long lines in cesm_constants.fh but my awk
command to find the line length was off-by-one

* Update github actions

Use a container with gcc 15.1.0

* Turn off autoconf CI testing

Also, don't define CESM_CONSTANTS macro (will let us drop CESM share code from
the turbo-stack submodules, and is bit-for-bit)

* Update containers for intel and coupler CI tests

* Drop FMS coupler CI test

I think our version of FMS is much older than what the FMS coupler is
expecting, so tests are failing... but we don't want to update our branch to
include newer FMS code so we should drop the test instead

* Drop YAML from configure.ac

* Drop SKIP_PARSER_TESTS from makefiles

I'm trying to make FMS configure look more like TIM configure, and some I
removed some stuff from configure.ac that is now breaking Makefiles

* Add mppnccombine.c to diag_manager makefile

This c file is included in the TIM version of Makefile.ac but not the FMS2
version, and may be responsible for failing CI build

* Remove bad test

test_gather2DV does not exist in TIM or in more recent FMS commits, and it has
some code that doesn't compile... I suspect that's related to why the test was
removed in subsequent PRs

* Increase filename size

test_simple_domain.nc does not fit into character(len=20); increased to len=25
just to have some breathing room

* Revert "Drop SKIP_PARSER_TESTS from makefiles"

This reverts commit b018921.

* Revert "Drop YAML from configure.ac"

This reverts commit f88a095.

---------

Co-authored-by: Edward Yang <edward_yang_125@hotmail.com>
Co-authored-by: Edward Yang <edward.yang@anu.edu.au>
Co-authored-by: Jorge Luis Gálvez Vallejo <jorgegalvez1694@gmail.com>
Co-authored-by: Marshall Ward <marshall.ward@noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.