MOM_vert_friction: Rewrite vertvisc_coef as kji by marshallward · Pull Request #935 · NOAA-GFDL/MOM6

marshallward · 2025-07-12T15:27:54Z

This patch rewrites the vertvisc_coef, find_coupling_coef, and find_coupling_coef_gl90 functions as loops in k-j-i form. Many operations previously applied over i-k slices are now applied concurrently over i-j slices.

This changes (and in some cases reduces) the cache usage, but it greatly increases the concurrency of the solvers, which should be more favorable to GPU migration.

In the "profile benchmark" p0 test, the runtime reduced by 8%, so this new method should be suitable for CPU and GPU.

Several arrays have been promoted to 3D in order to facilitate the 3d structure:

hvel, dz_vel, z_i, z_i_gl90
hvel_shelf, dz_vel_shelf
a_cpl, a_cpl_gl90, h_ml
dz_harm

There is a solution in here which reduces find_coupling_coef to an i-j operation, which would further reduce the number of 3d arrays and perhaps improve performance, but at the moment I can't quite get there.

There is also a possibility that arrays like a_cpl, h_ml, could be replaced with their equivalent arrays in CS such as CS%a_[uv], but this would require some method to handle the different shapes of a_u and a_v.

Because of the extensive penetration of the loop inversion, there isn't a practical means of breaking this into multiple commits, and the simplest approach is to accept it as a single commit.

Additional changes:

CS is no longer defined as a pointer, and is passed on stack. Association checks are also removed

This enables vectorization of loops which contain CS
A new function, touch_ij, which makes a trivial modification to loop indices i and j, is called before the u and v stages of vertvisc_coef.

The presence of this function seems to force the compiler to inspect the dependency of i and j during interprocedural optimization (IPO) and find additional optimizations.

It appears to be responsible for a 4% speedup.

This is not the final word on vertvisc_coef optimization, but I think it's good enough to move forward.

marshallward · 2025-07-12T16:44:33Z

It looks like an error in the OBC indexing. I will have a look.

marshallward · 2025-07-14T14:17:51Z

The OBC errors appear to be fixed now in the TCs.

This patch rewrites the vertvisc_coef, find_coupling_coef, and find_coupling_coef_gl90 functions as loops in k-j-i form. Many operations previously applied over i-k slices are now applied concurrently over i-j slices. This changes (and in some cases reduces) the cache usage, but it greatly increases the concurrency of the solvers, which should be more favorable to GPU migration. In the "profile benchmark" p0 test, the runtime reduced by 8%, so this new method should be suitable for CPU and GPU. Several arrays have been promoted to 3D in order to facilitate the 3d structure: * hvel, dz_vel, z_i, z_i_gl90 * hvel_shelf, dz_vel_shelf * a_cpl, a_cpl_gl90 * dz_harm There is a solution in here which reduces find_coupling_coef to an i-j operation, which would further reduce the number of 3d arrays and perhaps improve performance, but at the moment I can't quite get there. There is also a possibility that arrays like a_cpl, h_ml, could be replaced with their equivalent arrays in CS such as CS%a_[uv], but this would require some method to handle the different shapes of a_u and a_v. Because of the extensive penetration of the loop inversion, there isn't a practical means of breaking this into multiple commits, and the simplest approach is to accept it as a single commit. Additional changes: * CS is no longer defined as a pointer, and is passed on stack. Association checks are also removed This enables vectorization of loops which contain CS * A new function, `touch_ij`, which makes a trivial modification to loop indices i and j, is called before the u and v stages of vertvisc_coef. The presence of this function seems to force the compiler to inspect the dependency of i and j during interprocedural optimization (IPO) and find additional optimizations. It appears to be responsible for a 4% speedup. * Most operations on a_cpl_gl90 are now conditional, rather than assuming that the array is zero when GL90 is disabled. This is not the final word on vertvisc_coef optimization, but I think it's good enough to move forward.

Hallberg-NOAA

I have visually reviewed the extensive changes in this commit, and I believe them to be correct and to only be a refactoring of the code. There are a few places where these changes are perhaps somewhat more aggressive in adopting an alternative code style than is strictly necessary, but not to a degree that is overly problematic. I am approving this commit conditional upon the pipeline testing demonstrating that does not change any answers.

Hallberg-NOAA · 2025-07-16T07:47:09Z

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28149.

marshallward force-pushed the vertvisc_coef branch 2 times, most recently from 60673ca to a719f48 Compare July 12, 2025 15:30

marshallward force-pushed the vertvisc_coef branch from a719f48 to c2e10ea Compare July 12, 2025 19:39

marshallward force-pushed the vertvisc_coef branch from c2e10ea to 3f0ea46 Compare July 14, 2025 21:07

Hallberg-NOAA force-pushed the vertvisc_coef branch from 3f0ea46 to d65447f Compare July 15, 2025 22:28

Hallberg-NOAA approved these changes Jul 15, 2025

View reviewed changes

Hallberg-NOAA merged commit 8b63869 into NOAA-GFDL:dev/gfdl Jul 16, 2025
52 checks passed

Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Jul 16, 2025

marshallward mentioned this pull request Jul 28, 2025

GFDL to main (2025-07-21) mom-ocean/MOM6#1668

Merged

marshallward deleted the vertvisc_coef branch July 28, 2025 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOM_vert_friction: Rewrite vertvisc_coef as kji#935

MOM_vert_friction: Rewrite vertvisc_coef as kji#935
Hallberg-NOAA merged 1 commit into
NOAA-GFDL:dev/gfdlfrom
marshallward:vertvisc_coef

marshallward commented Jul 12, 2025

Uh oh!

marshallward commented Jul 12, 2025

Uh oh!

marshallward commented Jul 14, 2025

Uh oh!

Hallberg-NOAA left a comment

Uh oh!

Hallberg-NOAA commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marshallward commented Jul 12, 2025

Uh oh!

marshallward commented Jul 12, 2025

Uh oh!

marshallward commented Jul 14, 2025

Uh oh!

Hallberg-NOAA left a comment

Choose a reason for hiding this comment

Uh oh!

Hallberg-NOAA commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants