Small improvements by edoyango · Pull Request #51 · marshallward/MOM6

edoyango · 2025-10-23T22:47:11Z

Hi @marshallward @JorgeG94

This is a pull request that fixes a few small things:

ports the find_ustar subroutine to GPU (where relevant to double_gyre)
- this is called at start of vertvisc and was still on CPU
in btstep sends wt_* to GPU after calculating on GPU
- avoids many scalar transfers in btstep_timeloop
download diagnostic variables [u,v]_old only if the diagnostics are turned on
fix segfault I was getting on V100 and A100s in MOM_Pressureforce_FV.

These changes should be compatible with the other pull requests open atm.

feel free to leave this until you're back marshall.

edoyango · 2025-10-29T03:49:20Z

rebased on top of other open pull requests

edoyango · 2025-10-30T05:44:24Z

made sure some of the loops in btstep are submitted as a single kernel. No differences in double_gyre, but got some nice speedups on benchmark - probably because of the vert 22 layers in benchmark vs double_gyre's 2.

# gadi v100 without fisued loops
(Ocean barotropic mode stepping)       2.652537
# gadi v100 with new jki loops
(Ocean barotropic mode stepping)       2.056607

I get similar improvements on the A100s at monash.

Pretty significant given that I only touched a handful of the loops. Not sure how well this scales if we increase domain size laterally though.

vertvisc could probably benefit from a similar treatment. Will look tomorrow.

edoyango · 2025-10-31T02:40:09Z

fused a few loops in vertical viscosity 204106f
The speedup was more modest because many expensive loops are still launching multiple kernels per k iteration. I couldn't port these loops because of the OBC loops that are mixed in e.g. https://github.com/edoyango/MOM6/blob/204106ff305d763e5e5131b6c3bc2a717724d50c/src/parameterizations/vertical/MOM_vert_friction.F90#L2544

on gadi

# no fused loops
(Ocean vertical viscosity)             13.136691

# fused loops
(Ocean vertical viscosity)             12.531298

Not sure I want to spend too much more time on fusing loops in vertvisc since there's a columnar rewrite in dev gfdl we'll port eventually.

marshallward · 2025-11-14T10:10:24Z

Are you able to rebase this? The content in #48 has been merged.

marshallward · 2025-11-14T10:42:21Z

The switch to j/k/i could be quite significant for the CPU as we eventually port to dev/gfdl. I'm becoming convinced that it may be the best path forward, but we should probably test these in some production runs before merging.

edoyango · 2025-11-18T08:34:10Z

Are you able to rebase this? The content in #48 has been merged.

rebased!

marshallward · 2025-11-25T18:23:21Z

This has effectively become our baseline for performance, so I think it's time to merge this in.

Although the j/k/i swaps do take us away from the live code (dev/gfdl or main), we can come back and sort it out down the road.

There's a lot of commits, but there's also quite a variety of changes, so I'll merge without any squashing.

edoyango force-pushed the small-improvements branch from 69786ba to 83d103d Compare October 24, 2025 01:53

JorgeG94 reviewed Oct 26, 2025

View reviewed changes

Comment thread src/core/MOM_PressureForce_FV.F90

JorgeG94 reviewed Oct 26, 2025

View reviewed changes

Comment thread src/parameterizations/vertical/MOM_vert_friction.F90

JorgeG94 approved these changes Oct 26, 2025

View reviewed changes

edoyango force-pushed the small-improvements branch from 83d103d to e9fcecf Compare October 29, 2025 03:46

uwagura reviewed Oct 30, 2025

View reviewed changes

Comment thread src/core/MOM_dynamics_split_RK2.F90 Outdated

uwagura reviewed Oct 30, 2025

View reviewed changes

Comment thread src/core/MOM.F90

edoyango and others added 10 commits November 18, 2025 19:07

fix segfault

ce16ab0

send wt_* to gpu for btstep_timeloop

7587f50

move uv_old download into if block

e28e5e3

port find_ustar_mech_forcing interface of find_ustar

f944a11

init I_Hbbl on GPU instead of CPU

b83768d

clean up some loops

ddb1554

kji -> jki some loops

9f48f27

remove redundant CS%CA[u,v]_pref downloads

6220a8e

fuse some loops in vertical viscosity

99121ce

move visc%nkml_visc_[u,v] to out of j loop

d05618e

edoyango force-pushed the small-improvements branch from a2cde1e to d05618e Compare November 18, 2025 08:07

marshallward merged commit 90b162e into marshallward:dev/gpu Nov 25, 2025
52 checks passed

edoyango deleted the small-improvements branch March 12, 2026 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small improvements#51

Small improvements#51
marshallward merged 10 commits into
marshallward:dev/gpufrom
edoyango:small-improvements

edoyango commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

edoyango commented Oct 29, 2025

Uh oh!

edoyango commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

edoyango commented Oct 31, 2025

Uh oh!

marshallward commented Nov 14, 2025

Uh oh!

marshallward commented Nov 14, 2025

Uh oh!

edoyango commented Nov 18, 2025

Uh oh!

marshallward commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

edoyango commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

edoyango commented Oct 29, 2025

Uh oh!

edoyango commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edoyango commented Oct 31, 2025

Uh oh!

marshallward commented Nov 14, 2025

Uh oh!

marshallward commented Nov 14, 2025

Uh oh!

edoyango commented Nov 18, 2025

Uh oh!

marshallward commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edoyango commented Oct 30, 2025 •

edited

Loading