Vert friction: Column loops moved in layers#973
Conversation
|
@claireyung if you are able to test this out, that would be a big help. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev/gfdl #973 +/- ##
============================================
- Coverage 37.39% 37.32% -0.07%
============================================
Files 306 306
Lines 93749 93511 -238
Branches 17977 17976 -1
============================================
- Hits 35053 34903 -150
+ Misses 52096 52017 -79
+ Partials 6600 6591 -9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @marshallward. All my icemount/seamount expts running as expected with this commit. However, I just realised that my ISOMIP 3D domain experiments are not working as well, and this commit (or just replacing the index as you suggested) doesn't seem to fix it. I will reply to the issue #971 with a link to the config. Sorry about this. |
|
I ran commit 8b63869 with the kji vertvisc reordering, and added the I don't know what to compare for bit repro, but it seems to me that this problem may not be related to I'll keep looking though, and will move the discussion to #971. |
de53fb2 to
b253f58
Compare
|
I modified this PR to include an additional commit which documents and fixes the referenced index problems. The columnar commit was also modified to reduce whitespace and formatting changes, so as to reduce the number of changed lines. |
b253f58 to
3c60fbf
Compare
Hallberg-NOAA
left a comment
There was a problem hiding this comment.
I have carefully examined every line of the refactoring changes in the second commit of this PR and the bug-fix changes in the first commit in this PR, and I believe them all to be correct. There would be some scope for a bit more minor cleanup here in a subsequent commit, such avoiding logical tests that have become duplicated after these changes, or reformatting some comments now that the declaration statements are much shorter, but none of these should change answers or have a very large impact on performance. This commit is ready to go in as it stands.
This patch fixes two class of index errors in multiple functions of `MOM_vert_friction.F90`: * `j=G%isc,G%jec` had been incorrectly applied to multiple loops. This went undetected because we almost exclusively use local indexing where `G%isc == G%jsc`, but is nonetheless a serious error. Thanks to Jorge Luis Gálvez Vallejo for reporting. * One errant loop in the shelf code had `i=is,je`. This was undetected due to poor ice shelf coverage testing. Thanks to Claire Yung for reporting.
This patch moves the k-column loops inside of ji-layer loops, rather
than outer-k loops of layers.
The primary motivation is to restore performance at high-bandwidth runs,
which were insufficiently tested during development of the k-j-i form.
The inner-column loops show improved performance for both low and
high-bandwidth runs.
The high-bandwidth benchmark case: (128-core, 256x128 x 75 layer)
```
Profile Reference
(Ocean vertical viscosity): 7.158s, 15.047s (-52.4%)
```
The low bandwidth case: (1-core, 32x32 x 75 layer)
```
Profile Reference
(Ocean vertical viscosity): 3.911s, 4.788s (-18.3%)
```
For the GFDL OM5 production configuration at 503, runtimes of the slowest ranks
were reduced in proportion to the high-bandwidth case above.
For the reference dev/gfdl,
```
hits tmin tmax tavg
(Ocean vertical viscosity) 288 4.303819 21.483670 14.452196
```
After apply this patch, times reduce ~40%
```
hits tmin tmax tavg
(Ocean vertical viscosity) 288 0.976130 13.398768 8.689331
```
* Moving to columns allowed for removal of many `do_i` tests, since the test is
applied before starting the loop.
* The `touch_ij` dummy function was removed, since we're no longer trying to
force an IPO optimization.
* The shelf requires a re-calculation of the various thickness averages
(h_arith, etc). These could be saved as 1D if it becomes a problem.
* In addition to the usual regression testing, I also found no regressions in
selected ice shelf configurations.
3c60fbf to
cbdc19a
Compare
|
This PR has passed pipeline testing at https://gitlab.gfdl.noaa.gov/ogrp/mom6ci/MOM6/-/pipelines/28922. |
This patch moves the k-column loops inside of ji-layer loops, rather than outer-k loops of layers.
The primary motivation is to restore performance at high-bandwidth runs, which were insufficiently tested during development of the k-j-i form.
The inner-column loops show improved performance for both low and high-bandwidth runs.
The high-bandwidth benchmark case: (128-core, 256x128 x 75 layer)
The low bandwidth case: (1-core, 32x32 x 75 layer)
For the GFDL OM5 production configuration at 503, runtimes of the slowest ranks were reduced in proportion to the high-bandwidth case above.
For the reference dev/gfdl,
After apply this patch, times reduce ~40%
Moving to columns allowed for removal of many
do_itests, since the test is applied before starting the loop.The
touch_ijdummy function was removed, since we're no longer trying to force an IPO optimization.The shelf requires a re-calculation of the various thickness averages (h_arith, etc). These could be saved as 1D if it becomes a problem.
Several indexing bugs are now resolved:
Many instances of
j=G%isc,G%jec. These went unnoticed becauseisc=jecin domain-local indexing, which is default.One ice shelf loop was incorrectly
i=is,je. Thanks to Claire Yung for reporting.Changing to inner-column loops nullifies most of these issues.
In addition to the usual regression testing, I also found no regressions in selected ice shelf configurations.