Cherrypick ompoffload devturbo#3
Closed
mnlevy1981 wants to merge 13 commits into
Closed
Conversation
* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call
To enable this, had to be removed - otherwise segfaults happen on the GPU.
This reverts commit 0cc2a77. Having both the CPU and GPU OpenMP directives compiled caused a significant slowdown in GPU packing/unpacking performance - even if parallelism is controlled using OpenMP "if" clause.
Some very minor changes to the OpenMP target MPI PR: * use_device_ptr -> use_device_addr This appears to be the updated form (or at least nvfortran says it is) * Whitespace added to `!$ use omp_lib` Does not seem crucial but from our previous discussion it appears more correct. * Removal of some trailing whitespace.
This patch refactors several lines to keep within the 121-character line length limit prescribed by the FMS style guidelines.
The no-comm (no MPI) interface has been updated to support the new omp_offload argument.
This ensures that (un)packing steps in do_group_update is performed with openmp cpu parallelism if ompoffload=.false.. Previously it would only do serial. This is implemented by undefining the GPU macro (currently __NVCOMPILER_OPENMP_GPU) and re-including the (un)packing files. To make this work, the default(shared) was used in all the relevant OpenMP directives. If default(none) is used, the loops would hang or segfault.
Author
|
When you use |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
#2 was a full merge of @edoyango's
ompoffloadbranch; this PR cherry-picks the ompoffload-specific commits without bringing in 15 months of FMS development.How Has This Been Tested?
This branch hasn't been tested at all yet... tomorrow I'll run it through the turbo-stack unit tests and also run double-gyre with it but I figured a draft PR would be nice so folks could start to look at the 13 ompoffload changes without being bogged down by the other 390 commits to main. A third potential path forward would be to open a PR that just merges branch point for ompoffload from
main(4fdd530), verify that still works with turbo-stack, and then mergeompoffloadon top of that... but that would get TIM and FMS2 out of sync so it's probably not a great path forward.