MOM_vert_friction: Rewrite vertvisc_coef as kji#935
Merged
Conversation
60673ca to
a719f48
Compare
Member
Author
|
It looks like an error in the OBC indexing. I will have a look. |
a719f48 to
c2e10ea
Compare
Member
Author
|
The OBC errors appear to be fixed now in the TCs. |
c2e10ea to
3f0ea46
Compare
This patch rewrites the vertvisc_coef, find_coupling_coef, and find_coupling_coef_gl90 functions as loops in k-j-i form. Many operations previously applied over i-k slices are now applied concurrently over i-j slices. This changes (and in some cases reduces) the cache usage, but it greatly increases the concurrency of the solvers, which should be more favorable to GPU migration. In the "profile benchmark" p0 test, the runtime reduced by 8%, so this new method should be suitable for CPU and GPU. Several arrays have been promoted to 3D in order to facilitate the 3d structure: * hvel, dz_vel, z_i, z_i_gl90 * hvel_shelf, dz_vel_shelf * a_cpl, a_cpl_gl90 * dz_harm There is a solution in here which reduces find_coupling_coef to an i-j operation, which would further reduce the number of 3d arrays and perhaps improve performance, but at the moment I can't quite get there. There is also a possibility that arrays like a_cpl, h_ml, could be replaced with their equivalent arrays in CS such as CS%a_[uv], but this would require some method to handle the different shapes of a_u and a_v. Because of the extensive penetration of the loop inversion, there isn't a practical means of breaking this into multiple commits, and the simplest approach is to accept it as a single commit. Additional changes: * CS is no longer defined as a pointer, and is passed on stack. Association checks are also removed This enables vectorization of loops which contain CS * A new function, `touch_ij`, which makes a trivial modification to loop indices i and j, is called before the u and v stages of vertvisc_coef. The presence of this function seems to force the compiler to inspect the dependency of i and j during interprocedural optimization (IPO) and find additional optimizations. It appears to be responsible for a 4% speedup. * Most operations on a_cpl_gl90 are now conditional, rather than assuming that the array is zero when GL90 is disabled. This is not the final word on vertvisc_coef optimization, but I think it's good enough to move forward.
3f0ea46 to
d65447f
Compare
Hallberg-NOAA
approved these changes
Jul 15, 2025
Member
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
I have visually reviewed the extensive changes in this commit, and I believe them to be correct and to only be a refactoring of the code. There are a few places where these changes are perhaps somewhat more aggressive in adopting an alternative code style than is strictly necessary, but not to a degree that is overly problematic. I am approving this commit conditional upon the pipeline testing demonstrating that does not change any answers.
Member
|
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28149. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch rewrites the vertvisc_coef, find_coupling_coef, and find_coupling_coef_gl90 functions as loops in k-j-i form. Many operations previously applied over i-k slices are now applied concurrently over i-j slices.
This changes (and in some cases reduces) the cache usage, but it greatly increases the concurrency of the solvers, which should be more favorable to GPU migration.
In the "profile benchmark" p0 test, the runtime reduced by 8%, so this new method should be suitable for CPU and GPU.
Several arrays have been promoted to 3D in order to facilitate the 3d structure:
There is a solution in here which reduces find_coupling_coef to an i-j operation, which would further reduce the number of 3d arrays and perhaps improve performance, but at the moment I can't quite get there.
There is also a possibility that arrays like a_cpl, h_ml, could be replaced with their equivalent arrays in CS such as CS%a_[uv], but this would require some method to handle the different shapes of a_u and a_v.
Because of the extensive penetration of the loop inversion, there isn't a practical means of breaking this into multiple commits, and the simplest approach is to accept it as a single commit.
Additional changes:
CS is no longer defined as a pointer, and is passed on stack. Association checks are also removed
This enables vectorization of loops which contain CS
A new function,
touch_ij, which makes a trivial modification to loop indices i and j, is called before the u and v stages of vertvisc_coef.The presence of this function seems to force the compiler to inspect the dependency of i and j during interprocedural optimization (IPO) and find additional optimizations.
It appears to be responsible for a 4% speedup.
This is not the final word on vertvisc_coef optimization, but I think it's good enough to move forward.