Skip to content

Tracer advection port#54

Merged
marshallward merged 7 commits into
marshallward:dev/gpufrom
edoyango:tracer-advection
Mar 27, 2026
Merged

Tracer advection port#54
marshallward merged 7 commits into
marshallward:dev/gpufrom
edoyango:tracer-advection

Conversation

@edoyango
Copy link
Copy Markdown
Collaborator

ported with CPU<->GPU updates at entry and exit to advect_tracer. Not sure why openmp macos fails... I don't have a mac machine so would love some help figuring out what's wrong here.

@marshallward
Copy link
Copy Markdown
Owner

Not entirely sure what could be causing the OpenMP fail, but it's probably not a MacOS issue, and more likely because the MacOS compiler is newer (v16?) and has stronger OpenMP directive support.

(Note that OpenMP was disabled for the Linux tests, since default Ubuntu compiler was missing too many features.)

Comment thread src/tracer/MOM_tracer_advect.F90
Comment thread src/tracer/MOM_tracer_advect.F90
Comment thread src/tracer/MOM_tracer_advect.F90
Comment thread src/tracer/MOM_tracer_advect.F90
@marshallward
Copy link
Copy Markdown
Owner

marshallward commented Nov 24, 2025

I was able to replicate this on a Mx Macbook, there does seem to be something happening inside of advect_tracer(). There is a diff in the Post-advect fields:

3043,3044c3043,3044
< h-point: mean=   1.3348432454011379E+01 min=   1.3347644996476363E+01 max=   1.3349172271339587E+01 Post-advect temp
< h-point: c=     17252 Post-advect temp
---
> h-point: mean=   1.3348432454004561E+01 min=   1.3347644996476363E+01 max=   1.3349172271339587E+01 Post-advect temp
> h-point: c=     17259 Post-advect temp
3047,3048c3047,3048
< h-point: mean=   1.9977172750198198E-04 min=   0.0000000000000000E+00 max=   2.2831261569977506E-04 Post-advect age
< h-point: c=     20460 Post-advect age
---
> h-point: mean=   1.9977172750205579E-04 min=   0.0000000000000000E+00 max=   2.2831050228310512E-04 Post-advect age
> h-point: c=     20457 Post-advect age
3050c3050
< h-point: c=     16733 Post-advect T2
---
> h-point: c=     16740 Post-advect T2
3055,3056c3055,3056

It seems fairly random, since t1.openmp will sometimes pass, or sometimes the failure will happen in tc1.openmp.diag or another test, e.g. tc1.a.openmp.

I'm guessing some thread may be uninitialized? It's possible the arm64 GCC builds handle initialization differently.


I tested out a few syntactic diffs like replacing !$ do but it didn't seem to have much of an effect.

@marshallward marshallward reopened this Nov 24, 2025
@edoyango edoyango force-pushed the tracer-advection branch 6 times, most recently from 18b7d31 to 985b2b0 Compare November 27, 2025 02:23
@edoyango
Copy link
Copy Markdown
Collaborator Author

tests now passing reliably! fix was to make sure all private variables are properly specified.

@edoyango edoyango moved this to Waiting for Review in MOM6 GPU port Nov 27, 2025
@marshallward
Copy link
Copy Markdown
Owner

marshallward commented Mar 26, 2026

I just tried testing this after merging into dev/gpu but hit a "partial presence" error:

Present table errors:
up(:,:,:) lives at 0x400246e3e010 size 40704 partially present in
host:0x400246e3e010 device:0x40027a584a00 size:39936 presentcount:0+1 line:119 name:hprev(:,:,:) file:/scratch4/GFDL/gfdloceans/Marshall.Ward/m6e/src/MOM6/src/tracer/MOM_tracer_advect.F90
FATAL ERROR: variable in data clause is partially present on the device: name=up(:,:,:)
 file:/scratch4/GFDL/gfdloceans/Marshall.Ward/m6e/src/MOM6/src/core/MOM_dynamics_split_RK2.F90 step_mom_dyn_split_rk2 line:429

(And we've probably all seen enough of these to know the error is probably neither at L429 nor due to up.)

I think the PR was probably fine but my recent modifications to dev/gpu may have caused this one.

I will try to look into it but we can talk more at our next meeting. (And of course, apologies if I broke it!)

@marshallward
Copy link
Copy Markdown
Owner

My best diagnosis is that the runtime does not realize that hprev has been deleted.

hprev was being returned map(from: hprev) at the end of advect_tracer, even though the variable is local and intent(in). So perhaps there is a OpenMP/OpenACC mixup on the availability of the address.

But I tried these fixes and it did not seem to help.

Oddly, the one fix that did work was simply removing hprev from the directives. I did get an answer change in benchmark though.

@marshallward
Copy link
Copy Markdown
Owner

Also: benchmark can run with hprev in the directives, but double-gyre cannot. Maybe this is some pathological error with zero-tracers.

@marshallward
Copy link
Copy Markdown
Owner

I just tested the original branch and I'm seeing the same issue.

I should have mentioned that I was running on a GH200 (ARM), with 25.11 and 26.3. I can't get on our H100+x86 nodes but I'll test it as soon as I can get on.

Comment thread src/tracer/MOM_tracer_advect.F90
fixes issue where compiler couldn't figure out that domore_u(j,k)
should be reduced on in a nested loop
Couldn't safely init tracers in MOM_tracer_registry
without changing answers. The number of transfers
end up being the same since tracers need to be
defensively updated to host even if they were
persistent on device.
@edoyango
Copy link
Copy Markdown
Collaborator Author

both the present table errors and answer changes should be fixed now. Have only tested it on my laptop with nvhpc 25.11.

Before, i wasn't testing double gyre so i assumed it was ok, and it seems the older compiler was better at identifying scalar reductions within teams? Not sure...

initial allocs and loops put after the early return
also adds DO_LOCALITY where needed
@marshallward marshallward merged commit 16a7a42 into marshallward:dev/gpu Mar 27, 2026
52 checks passed
@github-project-automation github-project-automation Bot moved this from Waiting for Review to Done in MOM6 GPU port Mar 27, 2026
@edoyango edoyango deleted the tracer-advection branch April 15, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants