Skip to content

Revert 2c8363e057dde026e65ddcec1b62c18d5e260017 to allow compiler opt…#267

Merged
laurenchilutti merged 3 commits into
NOAA-GFDL:dev/emcfrom
dkokron:hafs-rangeCheck3d
Jun 6, 2023
Merged

Revert 2c8363e057dde026e65ddcec1b62c18d5e260017 to allow compiler opt…#267
laurenchilutti merged 3 commits into
NOAA-GFDL:dev/emcfrom
dkokron:hafs-rangeCheck3d

Conversation

@dkokron
Copy link
Copy Markdown

@dkokron dkokron commented Apr 21, 2023

…imization. Also use minval/maxval intrinsic functions instead of a loop nest in range_check_3d()

Description
Performance profiling of a HAFS case on NOAA systems revealed significant of time was spent in subroutine range_check_3d(). This commit effectively reverts a commit from Oct 2021 (see below). This commit also changes the code to use the minval() and maxval() fortran intrinsics.

commit 2c8363e
Author: Xiaqiong Zhou Xiaqiong.Zhou@noaa.gov
Date: Thu Oct 21 17:51:10 2021 +0000
Revise back the range definition form. The compiling issue on DELL can be fixed by using -O0 instead of -O2 to compile fv_diagnostics.F90

I requested more details from Xiaqiong Zhou and got the following responses.

It is a very strange error when compiling fv_diagostics.F90 on DELL (OK on HERA, Orion et al).
vsrange = (/ -200., 200. /) was not accepted but it is OK to use
vsrange(1) = -200. ; vsrange(2) = 200.
In order to keep the original form as vsrange = (/ -200., 200. /), -O0 instead of -O2 to compile fv_diagnostics.F90 in dynamics.

DELL was retired. It should be an Intel compiler but I do not remember the version.

I don't see any compile time issues using ifort-19.1.3.304 at -O2 on the WCOSS2 systems.

How Has This Been Tested?
The modifications have been tested on WCOSS2 systems Acorn and Dogwood using a HAFS case as well as on Cactus and Dogwood by running the UFS (develop branch cloned on 17 April) regression suite.

Scenarios:

  1. Unmodified code and compiler flags (Baseline)
  2. Delete the line in FV3/atmos_cubed_sphere/CMakeLists.txt that adds "-O0" to the compile flags. Thus, this file gets compiled with the global defaults
  3. Replace the nested loop calculation in range_check_3d() (not in range_check_2d) with calls to the minval() and maxval() intrinsic functions.
  4. Same as scenario three with the addition of minval and maxval intrinsics in range_check_2d.

HAFS case regional simulation with one nest
The case was run on 26 nodes of the Acorn system for a 126 hour simulation.

Parent grid:
  layout = 24,20
  npx = 1321
  npy = 1201
  ntiles = 1
  npz = 81

Nest grid:
  layout = 20,12
  npx = 601
  npy = 601
  ntiles = 1
  npz = 81

The 26 nodes are allocated as follows.
ATM_petlist_bounds: 000 735
OCN_petlist_bounds: 736 855
MED_petlist_bounds: 736 855

Performance metric:
Add up the phase1 and phase2 timings printed in the output listing
grep PASS stdout | awk '{t+=$10;print t}' | tail -1
The units are seconds.

Scenario Trial1 Trial2
1 7881 7866.69
2 7206.2 7200.81
3 7179.37 Not run
4 7180.67 7163.4

Validation:
Using scenario four, the UFS regression suite revealed numerous diagnostic variables changed numerically. A review of all files declared as "NOT OK" by the pass/fail comparison revealed that all of those variables are calculated using routines found in fv_diagnostics.F90 (see attached file variables.txt)
variables.txt

Some of these variables show up in files that are part of the UFS regression suite pass/fail comparisons so a new baseline will be needed.

Comparison between stdout from scenario1 and scenario4 revealed certain variables related to tropical cyclone (TC) tracking changed at the 7th significant digit. A code review revealed that those variables are also calculated using routines from fv_diagnostics.F90

E.g. from time step 6299
u700 g2 max = 14.35157 min = -9.423827 | u700 g2 max = 14.35157 min = -9.423828
u850 g2 max = 7.198199 min = -18.35876 | u850 g2 max = 7.198200 min = -18.35876
v700 g2 max = 8.673572 min = -15.10008 | v700 g2 max = 8.673571 min = -15.10008

Comparing output from the TC tracker printed to stdout revealed no differences between scenario1 and scenario4.

E.g. the last output from a 126 hour simulation.
==> Baseline_5Day_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

==> RANGECHECKnD-GlobalminvalmaxvalOnly_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

Checklist:

Please check all whether they apply or not

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

… intrinsic functions instead of a loop nest in range_check_3d()
@bensonr
Copy link
Copy Markdown
Contributor

bensonr commented Apr 21, 2023

@dkokron - if you are suggesting changing it for check_range_3d, you should also make the same change for check_range_2d.

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 21, 2023 via email

@bensonr
Copy link
Copy Markdown
Contributor

bensonr commented Apr 21, 2023

I am asking for consistency and adhering to our requests for coding standards as code owners.
Does the 9% improvement come from the change to using an intrinsic in range_check_3d or would it have come from leaving fv_diagnostics.F90 intact and compiling correctly (not with -O0)?

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 21, 2023 via email

@bensonr
Copy link
Copy Markdown
Contributor

bensonr commented Apr 21, 2023

Thank you. I would suggest the 0.3% from the intrinsics might simply be run-to-run variation. As these range-checks should only be used for diagnostic purposes, I don't see the reason for a full set of tests, though that's probably a UFS governance requirement.

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 21, 2023 via email

@bensonr
Copy link
Copy Markdown
Contributor

bensonr commented Apr 21, 2023

That's great information and thanks for confirming it is not just an artifact. We'll get reviews done once you signal you are ready.

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 23, 2023 via email

@bensonr bensonr requested review from bensonr and junwang-noaa April 24, 2023 12:54
@junwang-noaa
Copy link
Copy Markdown
Collaborator

@dkokron May I ask on which platforms you ran UFS RT? I'd suggest of running full UFS RT on the operational platform wcoss2 to confirm the compiling issue Kate met before is no longer there any more. Also, I am curious, does HAFSv1 operational run need to output the diagnostic field U700/U850/V700 range check during integration? My understanding is that we only output the model prognostic fields from dycore, not those diag fields, which could slow down the model run. We compute them using asynchronized inline post. I assume the change you made will only impact the debug runs. for which I thought timing might not be an critical issue.

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 24, 2023 via email

@dkokron
Copy link
Copy Markdown
Author

dkokron commented Apr 25, 2023

Jun,
Is there anything else I can provide to help move this PR forward?
Dan

@junwang-noaa
Copy link
Copy Markdown
Collaborator

I talked to Bin who said that U700/V850/V700 fields are used by HAFS TC tracking diagnostic code in dycore. There is no impact on the model history files.

Update to latest in order to merge hafs-rangeCheck3d into mainline.
@FernandoAndrade-NOAA
Copy link
Copy Markdown

Testing on PR #1743 is complete, this PR is ready for merge

@jkbk2004
Copy link
Copy Markdown

jkbk2004 commented Jun 6, 2023

@laurenchilutti can you merge in this pr?

@laurenchilutti laurenchilutti merged commit 49f15ec into NOAA-GFDL:dev/emc Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants