Merge ompoffload devturbo by mnlevy1981 · Pull Request #2 · TURBO-ESM/FMS

mnlevy1981 · 2026-05-01T22:59:26Z

Description
Merged the ompoffload branch from @edoyango into dev/turbo. I have verified that it has built (my initial conflict-resolution was bad and it didn't built at first :), and I can run the code in TURBO-ESM/MOM6#20 with this version of FMS.

How Has This Been Tested?
I built an updated version of MOM6 with this FMS branch, the double_gyre example runs and is bit-for-bit with what is currently available on turbo-stack

…GFDL#1114)

)

…filename (NOAA-GFDL#1129)

…-GFDL#1133)

…me + updates test (NOAA-GFDL#1150)

…OAA-GFDL#1159)

…formatting (NOAA-GFDL#1182)

…ead of the module variable (NOAA-GFDL#1160)

…alar (NOAA-GFDL#1175)

…s attribute (NOAA-GFDL#1177)

…he variable (NOAA-GFDL#1197)

…s + minor bug fixes (NOAA-GFDL#1176)

…l + subregional files update (NOAA-GFDL#1261)

)

* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call

To enable this, had to be removed - otherwise segfaults happen on the GPU.

This reverts commit 0cc2a77. Having both the CPU and GPU OpenMP directives compiled caused a significant slowdown in GPU packing/unpacking performance - even if parallelism is controlled using OpenMP "if" clause.

Some very minor changes to the OpenMP target MPI PR: * use_device_ptr -> use_device_addr This appears to be the updated form (or at least nvfortran says it is) * Whitespace added to `!$ use omp_lib` Does not seem crucial but from our previous discussion it appears more correct. * Removal of some trailing whitespace.

This patch refactors several lines to keep within the 121-character line length limit prescribed by the FMS style guidelines.

The no-comm (no MPI) interface has been updated to support the new omp_offload argument.

This ensures that (un)packing steps in do_group_update is performed with openmp cpu parallelism if ompoffload=.false.. Previously it would only do serial. This is implemented by undefining the GPU macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the (un)packing files. To make this work, the default(shared) was used in all the relevant OpenMP directives. If default(none) is used, the loops would hang or segfault.

johnmauff · 2026-05-03T17:50:46Z

@mnlevy1981 Wow, that is a lot of changes! Would it be possible to identify changes made by Ed to support the GPU work versus other changes? I started to look at the changed files but quickly became overwhelmed by changes that had nothing to do with GPU enablement.

johnmauff · 2026-05-04T13:10:30Z

@mnlevy1981 I just looked at all of the commit messages in this PR. Only the last 13 or 14 appear to have been performed by Ed and Jorge. Can we cherry-pick those changes to the FMS2?

mnlevy1981 · 2026-05-04T14:12:25Z

@mnlevy1981 Wow, that is a lot of changes! Would it be possible to identify changes made by Ed to support the GPU work versus other changes? I started to look at the changed files but quickly became overwhelmed by changes that had nothing to do with GPU enablement.

@mnlevy1981 I just looked at all of the commit messages in this PR. Only the last 13 or 14 appear to have been performed by Ed and Jorge. Can we cherry-pick those changes to the FMS2?

We could, or we could merge mom-ocean:main onto dev/turbo (which should be a big set of changes, but might not need in-depth review since the code has already been approved by the entire MOM6 consortium) and the PR to merge dev/gpu should be much more manageable

mnlevy1981 · 2026-05-04T14:13:54Z

ugh, I just noticed what repo I'm in... I guess we'd need to bring FMS up to date with main or whenever the FMS team calls the primary branch, and then merge Ed's ompoffload branch separately. (I thought this was TURBO-ESM/MOM6#20)

* add gpu2gpu mpi transer with flag for do_group_update * add missing collapse(3) clauses * Use __NVCOMPILER macro for target regions * add back old omp directive wrapped in #ifndef __NVCOMPILER * port remaining un/pack loops * add multi gpu support (#2) * add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call * sub __NVCOMPILER with __NVCOMPILER_OPENMP_GPU * allow choice of gpu or cpu parallel To enable this, had to be removed - otherwise segfaults happen on the GPU. * fix omp set device call * Revert "allow choice of gpu or cpu parallel" This reverts commit 0cc2a77. Having both the CPU and GPU OpenMP directives compiled caused a significant slowdown in GPU packing/unpacking performance - even if parallelism is controlled using OpenMP "if" clause. * OMP MPI: Minor cleanups Some very minor changes to the OpenMP target MPI PR: * use_device_ptr -> use_device_addr This appears to be the updated form (or at least nvfortran says it is) * Whitespace added to `!$ use omp_lib` Does not seem crucial but from our previous discussion it appears more correct. * Removal of some trailing whitespace. * OMP target MPI: line length compliance This patch refactors several lines to keep within the 121-character line length limit prescribed by the FMS style guidelines. * OMP MPI: Update nocomm interface The no-comm (no MPI) interface has been updated to support the new omp_offload argument. * use openmp cpu if ompoffload=.false. This ensures that (un)packing steps in do_group_update is performed with openmp cpu parallelism if ompoffload=.false.. Previously it would only do serial. This is implemented by undefining the GPU macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the (un)packing files. To make this work, the default(shared) was used in all the relevant OpenMP directives. If default(none) is used, the loops would hang or segfault. * Linting clean-up Removed trailing whitespace, replaced tabs with spaces, and kept all lines at 121 lines or fewer * One more lint clean-up Missed one of the long lines in cesm_constants.fh * One more linting commit I thought I shortened all the long lines in cesm_constants.fh but my awk command to find the line length was off-by-one * Update github actions Use a container with gcc 15.1.0 * Turn off autoconf CI testing Also, don't define CESM_CONSTANTS macro (will let us drop CESM share code from the turbo-stack submodules, and is bit-for-bit) * Update containers for intel and coupler CI tests * Drop FMS coupler CI test I think our version of FMS is much older than what the FMS coupler is expecting, so tests are failing... but we don't want to update our branch to include newer FMS code so we should drop the test instead * Drop YAML from configure.ac * Drop SKIP_PARSER_TESTS from makefiles I'm trying to make FMS configure look more like TIM configure, and some I removed some stuff from configure.ac that is now breaking Makefiles * Add mppnccombine.c to diag_manager makefile This c file is included in the TIM version of Makefile.ac but not the FMS2 version, and may be responsible for failing CI build * Remove bad test test_gather2DV does not exist in TIM or in more recent FMS commits, and it has some code that doesn't compile... I suspect that's related to why the test was removed in subsequent PRs * Increase filename size test_simple_domain.nc does not fit into character(len=20); increased to len=25 just to have some breathing room * Revert "Drop SKIP_PARSER_TESTS from makefiles" This reverts commit b018921. * Revert "Drop YAML from configure.ac" This reverts commit f88a095. --------- Co-authored-by: Edward Yang <edward_yang_125@hotmail.com> Co-authored-by: Edward Yang <edward.yang@anu.edu.au> Co-authored-by: Jorge Luis Gálvez Vallejo <jorgegalvez1694@gmail.com> Co-authored-by: Marshall Ward <marshall.ward@noaa.gov>

uramirez8707 and others added 30 commits May 1, 2024 17:46

feat: modern diag add support for unstructured grid files axis (NOAA-…

552ae01

…GFDL#1114)

docs: modern diag table documentation (NOAA-GFDL#1122)

59ebadf

fix: modern diag race conditions and add send_data tests (NOAA-GFDL#1130

050a32c

)

fix: modern diag io updates for pack size and setting time to use in …

d559557

…filename (NOAA-GFDL#1129)

docs: modern diag add documentation explaining the is_ocean key (NOAA…

c85d9cb

…-GFDL#1133)

feat: Modern diag_manager add diurnal axis (NOAA-GFDL#1138)

0e4a2a8

feat: Modern diag manager add subzaxis (NOAA-GFDL#1148)

0c0967a

fix: Adds a variable to the diag_object to store the current model ti…

54f28a7

…me + updates test (NOAA-GFDL#1150)

style: modern diag follow the this convention for object procedures (N…

51bb47f

…OAA-GFDL#1159)

fix: Modern diag manager check if field is registered (NOAA-GFDL#1151)

9e255a6

fix: modern diag add has routine and fixes some type bound procedure …

d860531

…formatting (NOAA-GFDL#1182)

fix: bounds typo and change get_buffer to reference its argument inst…

eb51414

…ead of the module variable (NOAA-GFDL#1160)

feat: modern diag store a variable that defines if a variable is a sc…

20b9efe

…alar (NOAA-GFDL#1175)

fix: modern diag rename to add output to buffer names (NOAA-GFDL#1184)

707ddab

feat: update log_diag_field_info for modern diag (NOAA-GFDL#1090)

3065a3e

feat: Modern diag_manager write out the cell_measures and cell_method…

1ea7d60

…s attribute (NOAA-GFDL#1177)

feat: Modern diag_manager Add a function that determine the type of t…

3dc191b

…he variable (NOAA-GFDL#1197)

feat: Modern diag_manager add standard_name and coordinates attribute…

a935cc1

…s + minor bug fixes (NOAA-GFDL#1176)

feat: add get_diag_field_ids to fms_diag_yaml_mod (NOAA-GFDL#1186)

233099a

docs: update modern diag uml diagrams (NOAA-GFDL#1161)

719e0df

fix: Modern diag_manager fixes related to static and scalar fields (N…

7e87981

…OAA-GFDL#1188)

feat: Adds allocate_diag_field_output_buffers() to fms_diag_object_mod (

9a59b1f

NOAA-GFDL#1198)

feat: Adds a function, check_indices_order(), to diag_util_mod (NOAA-…

454538e

…GFDL#1203)

feat: modern diag add real_copy_set to diag_util_mod (NOAA-GFDL#1204)

dde7138

feat: modern diag add routine init_mask_3d() to diag_util_mod (NOAA-G…

6a07b27

…FDL#1201)

feat: modern diag initializes buffer_ids and buffer_allocated (NOAA-G…

abeb277

…FDL#1210)

feat: modern diag add fms_diag_compare_window() to fmsDiagObject_type (…

78aa657

…NOAA-GFDL#1230)

feat: modern diag write global/variable attributes defined in the yam…

0c4f54b

…l + subregional files update (NOAA-GFDL#1261)

feat: updates function fms_diag_output_buffer_mod::remap_buffer (NOAA…

84558cd

…-GFDL#1260)

chore: removed mpp_io_init from test_diag_dlinked_list

4a7166b

J-Lentz and others added 22 commits August 21, 2025 13:00

Fix array bounds error in the old diag manager (NOAA-GFDL#1751)

9328538

move the parse.inc in field_manager to include directory (NOAA-GFDL#1740

74e9d38

)

Grid2 mod Initialization Fix (NOAA-GFDL#1742)

51b3199

add a test to test the multi-file capability in data_override (NOAA-G…

ea8c29c

…FDL#1745)

MODERN_DIAG_MANAGER: Improvements to the diag_table.yaml format (NOAA…

c39f610

…-GFDL#1731)

update cmake build for testing, build types and mixed precision (NOAA…

4e18fb9

…-GFDL#1694)

build fixes for added tests and missed kind macro (NOAA-GFDL#1759)

4fdd530

add gpu2gpu mpi transer with flag for do_group_update

19fedef

add missing collapse(3) clauses

f81247b

Use __NVCOMPILER macro for target regions

9f068e3

add back old omp directive wrapped in #ifndef __NVCOMPILER

93d148e

port remaining un/pack loops

3e3da6e

add multi gpu support (ESCOMP#2)

d5739ef

* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call

sub __NVCOMPILER with __NVCOMPILER_OPENMP_GPU

b287471

allow choice of gpu or cpu parallel

0cc2a77

To enable this, had to be removed - otherwise segfaults happen on the GPU.

fix omp set device call

d26eb4d

Revert "allow choice of gpu or cpu parallel"

ca753a4

This reverts commit 0cc2a77. Having both the CPU and GPU OpenMP directives compiled caused a significant slowdown in GPU packing/unpacking performance - even if parallelism is controlled using OpenMP "if" clause.

OMP target MPI: line length compliance

fe5ddda

This patch refactors several lines to keep within the 121-character line length limit prescribed by the FMS style guidelines.

OMP MPI: Update nocomm interface

c91024a

The no-comm (no MPI) interface has been updated to support the new omp_offload argument.

Merge branch 'ompoffload' into merge_ompoffload_devturbo

4a3b0b5

This was referenced May 5, 2026

Cherrypick ompoffload devturbo #3

Closed

Cherrypick ompoffload devturbo take2 #4

Closed

mnlevy1981 closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge ompoffload devturbo#2

Merge ompoffload devturbo#2
mnlevy1981 wants to merge 402 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:merge_ompoffload_devturbo

mnlevy1981 commented May 1, 2026

Uh oh!

johnmauff commented May 3, 2026

Uh oh!

johnmauff commented May 4, 2026

Uh oh!

mnlevy1981 commented May 4, 2026

Uh oh!

mnlevy1981 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

mnlevy1981 commented May 1, 2026

Uh oh!

johnmauff commented May 3, 2026

Uh oh!

johnmauff commented May 4, 2026

Uh oh!

mnlevy1981 commented May 4, 2026

Uh oh!

mnlevy1981 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants