Revert 2c8363e057dde026e65ddcec1b62c18d5e260017 to allow compiler opt…#267
Conversation
… intrinsic functions instead of a loop nest in range_check_3d()
|
@dkokron - if you are suggesting changing it for check_range_3d, you should also make the same change for check_range_2d. |
|
range_check_2d doesn't show up in the performance profile. Altering
range_check_2d would also require redoing all the testing and analysis. As
it stands, the PR results in ~9% speedup for the case I tested. Do you
really want me to go through all that testing for no added benefit?
Dan
…On Fri, Apr 21, 2023 at 12:38 PM Rusty Benson ***@***.***> wrote:
@dkokron <https://github.com/dkokron> - if you are suggesting changing it
for check_range_3d, you should also make the same change for check_range_2d.
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2E5ZGAFCSU23S4IVWLXCLAXZANCNFSM6AAAAAAXHDTYBI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I am asking for consistency and adhering to our requests for coding standards as code owners. |
|
Most (8.6%) of the speedup is from enabling optimization (getting rid of
-O0). The remaining .3% in from using intrinsics.
Scenario 1: 7881 -O0
Scenario 2: 7206.2 -O2
Scenario 3: 7179.37 -O2 + intrinsics
I will redo the testing with the suggested changes to range_check_2d()
Dan
…On Fri, Apr 21, 2023 at 1:41 PM Rusty Benson ***@***.***> wrote:
I am asking for consistency and adhering to our requests for coding
standards as code owners.
Does the 9% improvement come from the change to using an intrinsic in
range_check_3d or would it have come from leaving fv_diagnostics.F90 intact
and compiling correctly (not with -O0)?
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2CDJUCBZZMLR2DIDBLXCLIHLANCNFSM6AAAAAAXHDTYBI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Thank you. I would suggest the 0.3% from the intrinsics might simply be run-to-run variation. As these range-checks should only be used for diagnostic purposes, I don't see the reason for a full set of tests, though that's probably a UFS governance requirement. |
|
Rusty,
The speedup from using the intrinsics shows up in numerous trials, so I'd
like to keep it.
Dan
…On Fri, Apr 21, 2023 at 2:36 PM Rusty Benson ***@***.***> wrote:
Thank you. I would suggest the 0.3% from the intrinsics might simply be
run-to-run variation. As these range-checks should only be used for
diagnostic purposes, I don't see the reason for a full set of tests, though
that's probably a UFS governance requirement.
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2C22YX546WA54JW6V3XCLOSBANCNFSM6AAAAAAXHDTYBI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
That's great information and thanks for confirming it is not just an artifact. We'll get reviews done once you signal you are ready. |
|
Rusty,
I reran the UFS regression suite and several timing runs with
range_check_2d() converted to use minval/maxval intrinsics. This
modification didn't change the validation results at all. I pushed the
changes to my fork and updated the PR.
Dan
…On Fri, Apr 21, 2023 at 2:49 PM Rusty Benson ***@***.***> wrote:
That's great information and thanks for confirming it is not just an
artifact. We'll get reviews done once you signal you are ready.
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2CBCLCITX32GN3QVK3XCLQD5ANCNFSM6AAAAAAXHDTYBI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@dkokron May I ask on which platforms you ran UFS RT? I'd suggest of running full UFS RT on the operational platform wcoss2 to confirm the compiling issue Kate met before is no longer there any more. Also, I am curious, does HAFSv1 operational run need to output the diagnostic field U700/U850/V700 range check during integration? My understanding is that we only output the model prognostic fields from dycore, not those diag fields, which could slow down the model run. We compute them using asynchronized inline post. I assume the change you made will only impact the debug runs. for which I thought timing might not be an critical issue. |
|
Jun,
I ran the UFS RT suite on Dogwood (last week before it was switched to ops)
and Cactus (22 April). I did not encounter any compiler issues. Please
provide specifics of the issue Kate encountered.
Regarding the question of debug output during the operational runs, please
reach out to Zhan Zhang as I am not qualified to answer your question.
Dan
…On Mon, Apr 24, 2023 at 9:19 AM Jun Wang ***@***.***> wrote:
@dkokron <https://github.com/dkokron> May I ask on which platforms you
ran UFS RT? I'd suggest of running full UFS RT on the operational platform
wcoss2 to confirm the compiling issue Kate met before is no longer there
any more. Also, I am curious, does HAFSv1 operational run need to output
the diagnostic field U700/U850/V700 range check during integration? My
understanding is that we only output the model prognostic fields from
dycore, not those diag fields, which could slow down the model run. We
compute them using asynchronized inline post. I assume the change you made
will only impact the debug runs. for which I thought timing might not be an
critical issue.
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACODV2GMUYR6PCSZVLWLSRLXC2DYFANCNFSM6AAAAAAXHDTYBI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Jun, |
|
I talked to Bin who said that U700/V850/V700 fields are used by HAFS TC tracking diagnostic code in dycore. There is no impact on the model history files. |
Update to latest in order to merge hafs-rangeCheck3d into mainline.
|
Testing on PR #1743 is complete, this PR is ready for merge |
|
@laurenchilutti can you merge in this pr? |
…imization. Also use minval/maxval intrinsic functions instead of a loop nest in range_check_3d()
Description
Performance profiling of a HAFS case on NOAA systems revealed significant of time was spent in subroutine range_check_3d(). This commit effectively reverts a commit from Oct 2021 (see below). This commit also changes the code to use the minval() and maxval() fortran intrinsics.
commit 2c8363e
Author: Xiaqiong Zhou Xiaqiong.Zhou@noaa.gov
Date: Thu Oct 21 17:51:10 2021 +0000
Revise back the range definition form. The compiling issue on DELL can be fixed by using -O0 instead of -O2 to compile fv_diagnostics.F90
I requested more details from Xiaqiong Zhou and got the following responses.
I don't see any compile time issues using ifort-19.1.3.304 at -O2 on the WCOSS2 systems.
How Has This Been Tested?
The modifications have been tested on WCOSS2 systems Acorn and Dogwood using a HAFS case as well as on Cactus and Dogwood by running the UFS (develop branch cloned on 17 April) regression suite.
Scenarios:
HAFS case regional simulation with one nest
The case was run on 26 nodes of the Acorn system for a 126 hour simulation.
Performance metric:
Add up the phase1 and phase2 timings printed in the output listing
grep PASS stdout | awk '{t+=$10;print t}' | tail -1
The units are seconds.
Validation:
Using scenario four, the UFS regression suite revealed numerous diagnostic variables changed numerically. A review of all files declared as "NOT OK" by the pass/fail comparison revealed that all of those variables are calculated using routines found in fv_diagnostics.F90 (see attached file variables.txt)
variables.txt
Some of these variables show up in files that are part of the UFS regression suite pass/fail comparisons so a new baseline will be needed.
Comparison between stdout from scenario1 and scenario4 revealed certain variables related to tropical cyclone (TC) tracking changed at the 7th significant digit. A code review revealed that those variables are also calculated using routines from fv_diagnostics.F90
E.g. from time step 6299
u700 g2 max = 14.35157 min = -9.423827 | u700 g2 max = 14.35157 min = -9.423828
u850 g2 max = 7.198199 min = -18.35876 | u850 g2 max = 7.198200 min = -18.35876
v700 g2 max = 8.673572 min = -15.10008 | v700 g2 max = 8.673571 min = -15.10008
Comparing output from the TC tracker printed to stdout revealed no differences between scenario1 and scenario4.
E.g. the last output from a 126 hour simulation.
==> Baseline_5Day_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294
==> RANGECHECKnD-GlobalminvalmaxvalOnly_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294
Checklist:
Please check all whether they apply or not