Rewrite vertvisc() and vertvisc_remnant() loops to kji form by marshallward · Pull Request #912 · NOAA-GFDL/MOM6

marshallward · 2025-06-02T17:40:17Z

This patch rewrites the tridiagonal solvers of vertvisc() and vertvisc_remnant() to kji-form, increasing the concurrency over j-points.

Overall runtime of vertical friction is reduced by about 5-6%.

    (Ocean vertical viscosity):   5.319s,   5.652s (-5.9%)
    (Ocean vertical viscosity):   5.416s,   5.713s (-5.2%)
    (Ocean vertical viscosity):   5.371s,   5.689s (-5.6%)

The vertvisc() runtime is reduced by about 8%.

    mom_vert_friction_mp_vertvisc_:   0.583s,   0.629s (-7.3%)
    mom_vert_friction_mp_vertvisc_:   0.576s,   0.634s (-9.2%)
    mom_vert_friction_mp_vertvisc_:   0.583s,   0.636s (-8.3%)

vertvisc_remnant() is reduced by about 25-30%.

    mom_vert_friction_mp_vertvisc_remnant_:   0.939s,   1.241s (-24.3%)
    mom_vert_friction_mp_vertvisc_remnant_:   0.935s,   1.265s (-26.0%)
    mom_vert_friction_mp_vertvisc_remnant_:   0.910s,   1.258s (-27.7%)

Only one new 3d was required. Several 1d arrays were promoted to 2d, or were reshaped to ij.

Some speedups were due to movement of diagnostics outside of the main tridiagonal loops, which enabled vectorization. Another speedup was due to conditionally populating the Rayleigh drag Ray.

Speedups are much higher if the loops are changed to do concurrent (e.g. 2x speedup in vertvisc) but this will be handled in a separate PR.

This new loop form is favorable to GPUs, and is part of the preparation for porting MOM6 to GPU platforms.

This PR is based on an earlier draft by @edoyango developed for GPU migration.

marshallward · 2025-06-03T20:02:55Z

I have some additional memory timings for Intel. Four instances are shown below.

There is a slight increase in memory time, although not by much. Roughly, time in memset has displaced time in memcpy.

__intel_avx_rep_memset:   2.078s,   1.833s (13.3%)
__intel_avx_rep_memcpy:   1.240s,   1.390s (-10.8%)

__intel_avx_rep_memset:   2.062s,   1.812s (13.8%)
__intel_avx_rep_memcpy:   1.312s,   1.411s (-7.1%)

__intel_avx_rep_memset:   2.056s,   1.844s (11.5%)
__intel_avx_rep_memcpy:   1.278s,   1.410s (-9.4%)

__intel_avx_rep_memset:   2.065s,   1.859s (11.1%)
__intel_avx_rep_memcpy:   1.357s,   1.304s (4.0%)

I don't think we need to be terribly worried about this. But we should probably consider this metric in similar future PRs.

The jki loops in vertvisc() have been reordered to kji. The solver increases the number of concurrent tridiagonal solvers from Ni to Ni*Nj. Two other changes contributed to performance * Moving diagnostics (e.g. ADp%du_dt_str) outside of loops * Conditional computing of Ray() when visc%Ray_[uv] is set Not all optimizations of this sort were applied, and should be reviwed in relevant experiments. This showed a modest performance improvement on CPUs. Three instances are shown below. * mom_vert_friction_mp_vertvisc_: 0.583s, 0.629s (-7.3%) * mom_vert_friction_mp_vertvisc_: 0.576s, 0.634s (-9.2%) * mom_vert_friction_mp_vertvisc_: 0.583s, 0.636s (-8.3%) This patch uses nested do loops since we have not yet adoped do concurrent loop constructs. But a future do concurrent form shows even greater speedup, e.g. * mom_vert_friction_mp_vertvisc_: 0.258s, 0.539s (-52.2%) The work in this PR will prepare this module for porting to GPUs. Co-authored-by: Edward Yang <edward_yang_125@hotmail.com>

As with vertvisc(), this patch rewrites the vertvisc_remnant() tridiagonal solvers to run in kji order, with even greater benefits to runtime. Three instances are shown below. Speedup is about 1.3-1.4x. * mom_vert_friction_mp_vertvisc_remnant_: 0.939s, 1.241s (-24.3%) * mom_vert_friction_mp_vertvisc_remnant_: 0.935s, 1.265s (-26.0%) * mom_vert_friction_mp_vertvisc_remnant_: 0.910s, 1.258s (-27.7%) As before, only the diagnoal array (b1) was promoted to 3d. As with vertvisc() this change is expected to be highly favorable to GPU performance.

Hallberg-NOAA

I have examined these proposed changes, and I am convinced that they are correct and improve the readability of the code, and moreover are likely to be more efficient across a range of computers. I am happy to accept this PR, pending successful results from the pipeline testing.

Hallberg-NOAA · 2025-06-04T13:41:33Z

This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/27665.

Hallberg-NOAA reviewed Jun 3, 2025

View reviewed changes

Comment thread src/parameterizations/vertical/MOM_vert_friction.F90 Outdated

Hallberg-NOAA reviewed Jun 3, 2025

View reviewed changes

Comment thread src/parameterizations/vertical/MOM_vert_friction.F90 Outdated

Hallberg-NOAA reviewed Jun 3, 2025

View reviewed changes

Comment thread src/parameterizations/vertical/MOM_vert_friction.F90 Outdated

marshallward and others added 2 commits June 3, 2025 18:17

marshallward force-pushed the vertvisc_kji branch from 7082094 to fb16a2d Compare June 3, 2025 22:19

Hallberg-NOAA approved these changes Jun 3, 2025

View reviewed changes

Hallberg-NOAA added enhancement New feature or request refactor Code cleanup with no changes in functionality or results labels Jun 3, 2025

Hallberg-NOAA merged commit 45699c5 into NOAA-GFDL:dev/gfdl Jun 4, 2025
52 checks passed

marshallward mentioned this pull request Jul 28, 2025

GFDL to main (2025-07-21) mom-ocean/MOM6#1668

Merged

marshallward deleted the vertvisc_kji branch November 18, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite vertvisc() and vertvisc_remnant() loops to kji form#912

Rewrite vertvisc() and vertvisc_remnant() loops to kji form#912
Hallberg-NOAA merged 2 commits into
NOAA-GFDL:dev/gfdlfrom
marshallward:vertvisc_kji

marshallward commented Jun 2, 2025

Uh oh!

marshallward commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hallberg-NOAA left a comment

Uh oh!

Hallberg-NOAA commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marshallward commented Jun 2, 2025

Uh oh!

marshallward commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hallberg-NOAA left a comment

Choose a reason for hiding this comment

Uh oh!

Hallberg-NOAA commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants