Revert 2c8363e057dde026e65ddcec1b62c18d5e260017 to allow compiler opt… by dkokron · Pull Request #267 · NOAA-GFDL/GFDL_atmos_cubed_sphere

dkokron · 2023-04-21T17:10:39Z

…imization. Also use minval/maxval intrinsic functions instead of a loop nest in range_check_3d()

Description
Performance profiling of a HAFS case on NOAA systems revealed significant of time was spent in subroutine range_check_3d(). This commit effectively reverts a commit from Oct 2021 (see below). This commit also changes the code to use the minval() and maxval() fortran intrinsics.

commit 2c8363e
Author: Xiaqiong Zhou Xiaqiong.Zhou@noaa.gov
Date: Thu Oct 21 17:51:10 2021 +0000
Revise back the range definition form. The compiling issue on DELL can be fixed by using -O0 instead of -O2 to compile fv_diagnostics.F90

I requested more details from Xiaqiong Zhou and got the following responses.

It is a very strange error when compiling fv_diagostics.F90 on DELL (OK on HERA, Orion et al).
vsrange = (/ -200., 200. /) was not accepted but it is OK to use
vsrange(1) = -200. ; vsrange(2) = 200.
In order to keep the original form as vsrange = (/ -200., 200. /), -O0 instead of -O2 to compile fv_diagnostics.F90 in dynamics.

DELL was retired. It should be an Intel compiler but I do not remember the version.

I don't see any compile time issues using ifort-19.1.3.304 at -O2 on the WCOSS2 systems.

How Has This Been Tested?
The modifications have been tested on WCOSS2 systems Acorn and Dogwood using a HAFS case as well as on Cactus and Dogwood by running the UFS (develop branch cloned on 17 April) regression suite.

Scenarios:

Unmodified code and compiler flags (Baseline)
Delete the line in FV3/atmos_cubed_sphere/CMakeLists.txt that adds "-O0" to the compile flags. Thus, this file gets compiled with the global defaults
Replace the nested loop calculation in range_check_3d() (not in range_check_2d) with calls to the minval() and maxval() intrinsic functions.
Same as scenario three with the addition of minval and maxval intrinsics in range_check_2d.

HAFS case regional simulation with one nest
The case was run on 26 nodes of the Acorn system for a 126 hour simulation.

Parent grid:
  layout = 24,20
  npx = 1321
  npy = 1201
  ntiles = 1
  npz = 81

Nest grid:
  layout = 20,12
  npx = 601
  npy = 601
  ntiles = 1
  npz = 81

The 26 nodes are allocated as follows.
ATM_petlist_bounds: 000 735
OCN_petlist_bounds: 736 855
MED_petlist_bounds: 736 855

Performance metric:
Add up the phase1 and phase2 timings printed in the output listing
grep PASS stdout | awk '{t+=$10;print t}' | tail -1
The units are seconds.

Scenario	Trial1	Trial2
1	7881	7866.69
2	7206.2	7200.81
3	7179.37	Not run
4	7180.67	7163.4

Validation:
Using scenario four, the UFS regression suite revealed numerous diagnostic variables changed numerically. A review of all files declared as "NOT OK" by the pass/fail comparison revealed that all of those variables are calculated using routines found in fv_diagnostics.F90 (see attached file variables.txt)
variables.txt

Some of these variables show up in files that are part of the UFS regression suite pass/fail comparisons so a new baseline will be needed.

Comparison between stdout from scenario1 and scenario4 revealed certain variables related to tropical cyclone (TC) tracking changed at the 7th significant digit. A code review revealed that those variables are also calculated using routines from fv_diagnostics.F90

E.g. from time step 6299
u700 g2 max = 14.35157 min = -9.423827 | u700 g2 max = 14.35157 min = -9.423828
u850 g2 max = 7.198199 min = -18.35876 | u850 g2 max = 7.198200 min = -18.35876
v700 g2 max = 8.673572 min = -15.10008 | v700 g2 max = 8.673571 min = -15.10008

Comparing output from the TC tracker printed to stdout revealed no differences between scenario1 and scenario4.

E.g. the last output from a 126 hour simulation.
==> Baseline_5Day_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

==> RANGECHECKnD-GlobalminvalmaxvalOnly_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

Checklist:

Please check all whether they apply or not

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

… intrinsic functions instead of a loop nest in range_check_3d()

bensonr · 2023-04-21T17:37:53Z

@dkokron - if you are suggesting changing it for check_range_3d, you should also make the same change for check_range_2d.

dkokron · 2023-04-21T18:32:27Z

range_check_2d doesn't show up in the performance profile. Altering range_check_2d would also require redoing all the testing and analysis. As it stands, the PR results in ~9% speedup for the case I tested. Do you really want me to go through all that testing for no added benefit? Dan

…

On Fri, Apr 21, 2023 at 12:38 PM Rusty Benson ***@***.***> wrote: @dkokron <https://github.com/dkokron> - if you are suggesting changing it for check_range_3d, you should also make the same change for check_range_2d. — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACODV2E5ZGAFCSU23S4IVWLXCLAXZANCNFSM6AAAAAAXHDTYBI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bensonr · 2023-04-21T18:41:45Z

I am asking for consistency and adhering to our requests for coding standards as code owners.
Does the 9% improvement come from the change to using an intrinsic in range_check_3d or would it have come from leaving fv_diagnostics.F90 intact and compiling correctly (not with -O0)?

dkokron · 2023-04-21T19:14:51Z

Most (8.6%) of the speedup is from enabling optimization (getting rid of -O0). The remaining .3% in from using intrinsics. Scenario 1: 7881 -O0 Scenario 2: 7206.2 -O2 Scenario 3: 7179.37 -O2 + intrinsics I will redo the testing with the suggested changes to range_check_2d() Dan

…

On Fri, Apr 21, 2023 at 1:41 PM Rusty Benson ***@***.***> wrote: I am asking for consistency and adhering to our requests for coding standards as code owners. Does the 9% improvement come from the change to using an intrinsic in range_check_3d or would it have come from leaving fv_diagnostics.F90 intact and compiling correctly (not with -O0)? — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACODV2CDJUCBZZMLR2DIDBLXCLIHLANCNFSM6AAAAAAXHDTYBI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bensonr · 2023-04-21T19:35:49Z

Thank you. I would suggest the 0.3% from the intrinsics might simply be run-to-run variation. As these range-checks should only be used for diagnostic purposes, I don't see the reason for a full set of tests, though that's probably a UFS governance requirement.

dkokron · 2023-04-21T19:46:42Z

Rusty, The speedup from using the intrinsics shows up in numerous trials, so I'd like to keep it. Dan

…

On Fri, Apr 21, 2023 at 2:36 PM Rusty Benson ***@***.***> wrote: Thank you. I would suggest the 0.3% from the intrinsics might simply be run-to-run variation. As these range-checks should only be used for diagnostic purposes, I don't see the reason for a full set of tests, though that's probably a UFS governance requirement. — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACODV2C22YX546WA54JW6V3XCLOSBANCNFSM6AAAAAAXHDTYBI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bensonr · 2023-04-21T19:49:08Z

That's great information and thanks for confirming it is not just an artifact. We'll get reviews done once you signal you are ready.

dkokron · 2023-04-23T14:47:34Z

Rusty, I reran the UFS regression suite and several timing runs with range_check_2d() converted to use minval/maxval intrinsics. This modification didn't change the validation results at all. I pushed the changes to my fork and updated the PR. Dan

…

On Fri, Apr 21, 2023 at 2:49 PM Rusty Benson ***@***.***> wrote: That's great information and thanks for confirming it is not just an artifact. We'll get reviews done once you signal you are ready. — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACODV2CBCLCITX32GN3QVK3XCLQD5ANCNFSM6AAAAAAXHDTYBI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

junwang-noaa · 2023-04-24T14:19:35Z

@dkokron May I ask on which platforms you ran UFS RT? I'd suggest of running full UFS RT on the operational platform wcoss2 to confirm the compiling issue Kate met before is no longer there any more. Also, I am curious, does HAFSv1 operational run need to output the diagnostic field U700/U850/V700 range check during integration? My understanding is that we only output the model prognostic fields from dycore, not those diag fields, which could slow down the model run. We compute them using asynchronized inline post. I assume the change you made will only impact the debug runs. for which I thought timing might not be an critical issue.

dkokron · 2023-04-24T16:10:36Z

Jun, I ran the UFS RT suite on Dogwood (last week before it was switched to ops) and Cactus (22 April). I did not encounter any compiler issues. Please provide specifics of the issue Kate encountered. Regarding the question of debug output during the operational runs, please reach out to Zhan Zhang as I am not qualified to answer your question. Dan

…

On Mon, Apr 24, 2023 at 9:19 AM Jun Wang ***@***.***> wrote: @dkokron <https://github.com/dkokron> May I ask on which platforms you ran UFS RT? I'd suggest of running full UFS RT on the operational platform wcoss2 to confirm the compiling issue Kate met before is no longer there any more. Also, I am curious, does HAFSv1 operational run need to output the diagnostic field U700/U850/V700 range check during integration? My understanding is that we only output the model prognostic fields from dycore, not those diag fields, which could slow down the model run. We compute them using asynchronized inline post. I assume the change you made will only impact the debug runs. for which I thought timing might not be an critical issue. — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACODV2GMUYR6PCSZVLWLSRLXC2DYFANCNFSM6AAAAAAXHDTYBI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dkokron · 2023-04-25T14:07:41Z

Jun,
Is there anything else I can provide to help move this PR forward?
Dan

junwang-noaa · 2023-04-25T18:59:57Z

I talked to Bin who said that U700/V850/V700 fields are used by HAFS TC tracking diagnostic code in dycore. There is no impact on the model history files.

Update to latest in order to merge hafs-rangeCheck3d into mainline.

FernandoAndrade-NOAA · 2023-06-05T19:55:33Z

Testing on PR #1743 is complete, this PR is ready for merge

jkbk2004 · 2023-06-06T00:14:33Z

@laurenchilutti can you merge in this pr?

Revert 2c8363e to allow compiler optimization. Also use minval/maxval…

69568eb

… intrinsic functions instead of a loop nest in range_check_3d()

Switch range_check_2d to use minval/maxval intrinsics as requested

f45f8f6

bensonr requested review from bensonr and junwang-noaa April 24, 2023 12:54

bensonr approved these changes Apr 24, 2023

View reviewed changes

dkokron mentioned this pull request Apr 24, 2023

range_check_3d optimization NOAA-EMC/ufsatm#649

Merged

junwang-noaa approved these changes Apr 25, 2023

View reviewed changes

dkokron mentioned this pull request May 9, 2023

Hafs range check3d ufs-community/ufs-weather-model#1743

Merged

35 tasks

Merge remote-tracking branch 'upstream/dev/emc' into hafs-rangeCheck3d

b012ecf

Update to latest in order to merge hafs-rangeCheck3d into mainline.

laurenchilutti merged commit 49f15ec into NOAA-GFDL:dev/emc Jun 6, 2023

Conversation

dkokron commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bensonr commented Apr 21, 2023

Uh oh!

dkokron commented Apr 21, 2023 via email

Uh oh!

bensonr commented Apr 21, 2023

Uh oh!

dkokron commented Apr 21, 2023 via email

Uh oh!

bensonr commented Apr 21, 2023

Uh oh!

dkokron commented Apr 21, 2023 via email

Uh oh!

bensonr commented Apr 21, 2023

Uh oh!

dkokron commented Apr 23, 2023 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junwang-noaa commented Apr 24, 2023

Uh oh!

dkokron commented Apr 24, 2023 via email

Uh oh!

dkokron commented Apr 25, 2023

Uh oh!

junwang-noaa commented Apr 25, 2023

Uh oh!

FernandoAndrade-NOAA commented Jun 5, 2023

Uh oh!

jkbk2004 commented Jun 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dkokron commented Apr 21, 2023 •

edited

Loading

dkokron commented Apr 23, 2023 via email •

edited

Loading