Skip to content

MOM_vert_friction: Rewrite vertvisc_coef as kji#935

Merged
Hallberg-NOAA merged 1 commit into
NOAA-GFDL:dev/gfdlfrom
marshallward:vertvisc_coef
Jul 16, 2025
Merged

MOM_vert_friction: Rewrite vertvisc_coef as kji#935
Hallberg-NOAA merged 1 commit into
NOAA-GFDL:dev/gfdlfrom
marshallward:vertvisc_coef

Conversation

@marshallward
Copy link
Copy Markdown
Member

This patch rewrites the vertvisc_coef, find_coupling_coef, and find_coupling_coef_gl90 functions as loops in k-j-i form. Many operations previously applied over i-k slices are now applied concurrently over i-j slices.

This changes (and in some cases reduces) the cache usage, but it greatly increases the concurrency of the solvers, which should be more favorable to GPU migration.

In the "profile benchmark" p0 test, the runtime reduced by 8%, so this new method should be suitable for CPU and GPU.

Several arrays have been promoted to 3D in order to facilitate the 3d structure:

  • hvel, dz_vel, z_i, z_i_gl90
  • hvel_shelf, dz_vel_shelf
  • a_cpl, a_cpl_gl90, h_ml
  • dz_harm

There is a solution in here which reduces find_coupling_coef to an i-j operation, which would further reduce the number of 3d arrays and perhaps improve performance, but at the moment I can't quite get there.

There is also a possibility that arrays like a_cpl, h_ml, could be replaced with their equivalent arrays in CS such as CS%a_[uv], but this would require some method to handle the different shapes of a_u and a_v.

Because of the extensive penetration of the loop inversion, there isn't a practical means of breaking this into multiple commits, and the simplest approach is to accept it as a single commit.

Additional changes:

  • CS is no longer defined as a pointer, and is passed on stack. Association checks are also removed

    This enables vectorization of loops which contain CS

  • A new function, touch_ij, which makes a trivial modification to loop indices i and j, is called before the u and v stages of vertvisc_coef.

    The presence of this function seems to force the compiler to inspect the dependency of i and j during interprocedural optimization (IPO) and find additional optimizations.

    It appears to be responsible for a 4% speedup.

This is not the final word on vertvisc_coef optimization, but I think it's good enough to move forward.

@marshallward marshallward force-pushed the vertvisc_coef branch 2 times, most recently from 60673ca to a719f48 Compare July 12, 2025 15:30
@marshallward
Copy link
Copy Markdown
Member Author

It looks like an error in the OBC indexing. I will have a look.

@marshallward
Copy link
Copy Markdown
Member Author

The OBC errors appear to be fixed now in the TCs.

This patch rewrites the vertvisc_coef, find_coupling_coef, and
find_coupling_coef_gl90 functions as loops in k-j-i form.  Many
operations previously applied over i-k slices are now applied
concurrently over i-j slices.

This changes (and in some cases reduces) the cache usage, but it greatly
increases the concurrency of the solvers, which should be more favorable
to GPU migration.

In the "profile benchmark" p0 test, the runtime reduced by 8%, so this
new method should be suitable for CPU and GPU.

Several arrays have been promoted to 3D in order to facilitate the 3d
structure:

* hvel, dz_vel, z_i, z_i_gl90
* hvel_shelf, dz_vel_shelf
* a_cpl, a_cpl_gl90
* dz_harm

There is a solution in here which reduces find_coupling_coef to an i-j
operation, which would further reduce the number of 3d arrays and
perhaps improve performance, but at the moment I can't quite get there.

There is also a possibility that arrays like a_cpl, h_ml, could be
replaced with their equivalent arrays in CS such as CS%a_[uv], but this
would require some method to handle the different shapes of a_u and a_v.

Because of the extensive penetration of the loop inversion, there isn't
a practical means of breaking this into multiple commits, and the
simplest approach is to accept it as a single commit.

Additional changes:

* CS is no longer defined as a pointer, and is passed on stack.
  Association checks are also removed

  This enables vectorization of loops which contain CS

* A new function, `touch_ij`, which makes a trivial modification to loop
  indices i and j, is called before the u and v stages of vertvisc_coef.

  The presence of this function seems to force the compiler to inspect
  the dependency of i and j during interprocedural optimization (IPO)
  and find additional optimizations.

  It appears to be responsible for a 4% speedup.

* Most operations on a_cpl_gl90 are now conditional, rather than
  assuming that the array is zero when GL90 is disabled.

This is not the final word on vertvisc_coef optimization, but I think
it's good enough to move forward.
Copy link
Copy Markdown
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have visually reviewed the extensive changes in this commit, and I believe them to be correct and to only be a refactoring of the code. There are a few places where these changes are perhaps somewhat more aggressive in adopting an alternative code style than is strictly necessary, but not to a degree that is overly problematic. I am approving this commit conditional upon the pipeline testing demonstrating that does not change any answers.

@Hallberg-NOAA
Copy link
Copy Markdown
Member

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28149.

@Hallberg-NOAA Hallberg-NOAA merged commit 8b63869 into NOAA-GFDL:dev/gfdl Jul 16, 2025
52 checks passed
@Hallberg-NOAA Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Jul 16, 2025
@marshallward marshallward deleted the vertvisc_coef branch July 28, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Code cleanup with no changes in functionality or results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants