Skip to content

Cherrypick ompoffload devturbo#3

Closed
mnlevy1981 wants to merge 13 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:cherrypick_ompoffload_devturbo
Closed

Cherrypick ompoffload devturbo#3
mnlevy1981 wants to merge 13 commits into
TURBO-ESM:dev/turbofrom
mnlevy1981:cherrypick_ompoffload_devturbo

Conversation

@mnlevy1981
Copy link
Copy Markdown

Description
#2 was a full merge of @edoyango's ompoffload branch; this PR cherry-picks the ompoffload-specific commits without bringing in 15 months of FMS development.

How Has This Been Tested?
This branch hasn't been tested at all yet... tomorrow I'll run it through the turbo-stack unit tests and also run double-gyre with it but I figured a draft PR would be nice so folks could start to look at the 13 ompoffload changes without being bogged down by the other 390 commits to main. A third potential path forward would be to open a PR that just merges branch point for ompoffload from main (4fdd530), verify that still works with turbo-stack, and then merge ompoffload on top of that... but that would get TIM and FMS2 out of sync so it's probably not a great path forward.

edoyango and others added 13 commits May 5, 2026 09:41
* add multi gpu support

* address review comments, add helpful comment for the acc/mp runbtime call
To enable this,  had to be removed -
otherwise segfaults happen on the GPU.
This reverts commit 0cc2a77.
Having both the CPU and GPU OpenMP directives compiled caused
a significant slowdown in GPU packing/unpacking performance -
even if parallelism is controlled using OpenMP "if" clause.
Some very minor changes to the OpenMP target MPI PR:

* use_device_ptr -> use_device_addr

  This appears to be the updated form (or at least nvfortran says it is)

* Whitespace added to `!$ use omp_lib`

  Does not seem crucial but from our previous discussion it appears more
  correct.

* Removal of some trailing whitespace.
This patch refactors several lines to keep within the 121-character line
length limit prescribed by the FMS style guidelines.
The no-comm (no MPI) interface has been updated to support the new
omp_offload argument.
This ensures that (un)packing steps in do_group_update is performed
with openmp cpu parallelism if ompoffload=.false.. Previously it
would only do serial. This is implemented by undefining the GPU
macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the
(un)packing files.

To make this work, the default(shared) was used in all the relevant
OpenMP directives. If default(none) is used, the loops would hang
or segfault.
@mnlevy1981
Copy link
Copy Markdown
Author

When you use git cherry-pick for multiple commits (git cherry-pick SHA1..SHA2), it brings in the changes between those two SHAs (so it does not cherry-pick SHA1). I'll open a new PR where I use git cherry-pick SHA1^..SHA2

@mnlevy1981 mnlevy1981 closed this May 6, 2026
@mnlevy1981 mnlevy1981 mentioned this pull request May 6, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants