Tracer advection port#54
Conversation
|
Not entirely sure what could be causing the OpenMP fail, but it's probably not a MacOS issue, and more likely because the MacOS compiler is newer (v16?) and has stronger OpenMP directive support. (Note that OpenMP was disabled for the Linux tests, since default Ubuntu compiler was missing too many features.) |
|
I was able to replicate this on a Mx Macbook, there does seem to be something happening inside of It seems fairly random, since I'm guessing some thread may be uninitialized? It's possible the arm64 GCC builds handle initialization differently. I tested out a few syntactic diffs like replacing |
18b7d31 to
985b2b0
Compare
|
tests now passing reliably! fix was to make sure all private variables are properly specified. |
|
I just tried testing this after merging into dev/gpu but hit a "partial presence" error: (And we've probably all seen enough of these to know the error is probably neither at L429 nor due to I think the PR was probably fine but my recent modifications to dev/gpu may have caused this one. I will try to look into it but we can talk more at our next meeting. (And of course, apologies if I broke it!) |
|
My best diagnosis is that the runtime does not realize that
But I tried these fixes and it did not seem to help. Oddly, the one fix that did work was simply removing |
|
Also: benchmark can run with |
|
I just tested the original branch and I'm seeing the same issue. I should have mentioned that I was running on a GH200 (ARM), with 25.11 and 26.3. I can't get on our H100+x86 nodes but I'll test it as soon as I can get on. |
985b2b0 to
052d775
Compare
fixes issue where compiler couldn't figure out that domore_u(j,k) should be reduced on in a nested loop
052d775 to
a986b67
Compare
Couldn't safely init tracers in MOM_tracer_registry without changing answers. The number of transfers end up being the same since tracers need to be defensively updated to host even if they were persistent on device.
a986b67 to
6403309
Compare
|
both the present table errors and answer changes should be fixed now. Have only tested it on my laptop with nvhpc 25.11. Before, i wasn't testing double gyre so i assumed it was ok, and it seems the older compiler was better at identifying scalar reductions within teams? Not sure... |
initial allocs and loops put after the early return also adds DO_LOCALITY where needed
6403309 to
b76e03f
Compare
ported with CPU<->GPU updates at entry and exit to
advect_tracer. Not sure why openmp macos fails... I don't have a mac machine so would love some help figuring out what's wrong here.