Skip to content

Hafs range check3d#1743

Merged
FernandoAndrade-NOAA merged 18 commits into
ufs-community:developfrom
dkokron:hafs-rangeCheck3d
Jun 6, 2023
Merged

Hafs range check3d#1743
FernandoAndrade-NOAA merged 18 commits into
ufs-community:developfrom
dkokron:hafs-rangeCheck3d

Conversation

@dkokron
Copy link
Copy Markdown
Contributor

@dkokron dkokron commented May 9, 2023

Description

Performance profiling of a HAFS case on NOAA systems revealed significant of time was spent in subroutine range_check_3d(). This commit effectively reverts a commit from Oct 2021 (see below). This commit also changes the code to use the minval() and maxval() fortran intrinsics.

commit 2c8363e057dde026e65ddcec1b62c18d5e260017
Author: Xiaqiong Zhou Xiaqiong.Zhou@noaa.gov
Date: Thu Oct 21 17:51:10 2021 +0000
Revise back the range definition form. The compiling issue on DELL can be fixed by using -O0 instead of -O2 to compile fv_diagnostics.F90

I requested more details from Xiaqiong Zhou and got the following responses.

It is a very strange error when compiling fv_diagostics.F90 on DELL (OK on HERA, Orion et al).
vsrange = (/ -200., 200. /) was not accepted but it is OK to use
vsrange(1) = -200. ; vsrange(2) = 200.
In order to keep the original form as vsrange = (/ -200., 200. /), -O0 instead of -O2 to compile fv_diagnostics.F90 in dynamics.

DELL was retired. It should be an Intel compiler but I do not remember the version.

I don't see any compile time issues using ifort-19.1.3.304 at -O2 on the WCOSS2 systems.

How Has This Been Tested?
The modifications have been tested on WCOSS2 systems Acorn and Dogwood using a HAFS case as well as on Cactus and Dogwood by running the UFS (develop branch cloned on 17 April) regression suite.

Scenarios:

  1. Unmodified code and compiler flags (Baseline)
  2. Delete the line in FV3/atmos_cubed_sphere/CMakeLists.txt that adds "-O0" to the compile flags. Thus, this file gets compiled with the global defaults
  3. Replace the nested loop calculation in range_check_3d() (not in range_check_2d) with calls to the minval() and maxval() intrinsic functions.
  4. Same as scenario three with the addition of minval and maxval intrinsics in range_check_2d.

HAFS case regional simulation with one nest
The case was run on 26 nodes of the Acorn system for a 126 hour simulation.

Parent grid:
  layout = 24,20
  npx = 1321
  npy = 1201
  ntiles = 1
  npz = 81

Nest grid:
  layout = 20,12
  npx = 601
  npy = 601
  ntiles = 1
  npz = 81

The 26 nodes are allocated as follows.
ATM_petlist_bounds: 000 735
OCN_petlist_bounds: 736 855
MED_petlist_bounds: 736 855

Performance metric:
Add up the phase1 and phase2 timings printed in the output listing
grep PASS stdout | awk '{t+=$10;print t}' | tail -1
The units are seconds.

Scenario Trial1 Trial2
1 7881 7866.69
2 7206.2 7200.81
3 7179.37 Not run
4 7180.67 7163.4

Validation:
Using scenario four, the UFS regression suite revealed numerous diagnostic variables changed numerically. A review of all files declared as "NOT OK" by the pass/fail comparison revealed that all of those variables are calculated using routines found in fv_diagnostics.F90 (see attached file variables.txt)
variables.txt

Some of these variables show up in files that are part of the UFS regression suite pass/fail comparisons so a new baseline will be needed.

Comparison between stdout from scenario1 and scenario4 revealed certain variables related to tropical cyclone (TC) tracking changed at the 7th significant digit. A code review revealed that those variables are also calculated using routines from fv_diagnostics.F90

E.g. from time step 6299
u700 g2 max = 14.35157 min = -9.423827 | u700 g2 max = 14.35157 min = -9.423828
u850 g2 max = 7.198199 min = -18.35876 | u850 g2 max = 7.198200 min = -18.35876
v700 g2 max = 8.673572 min = -15.10008 | v700 g2 max = 8.673571 min = -15.10008

Comparing output from the TC tracker printed to stdout revealed no differences between scenario1 and scenario4.

E.g. the last output from a 126 hour simulation.
==> Baseline_5Day_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

==> RANGECHECKnD-GlobalminvalmaxvalOnly_736p/tracker.txt <==
tracker fixlon= 350.647 fixlat= 30.150 ifix= 302 jfix= 302 pmin= 100795.047 vmax= 16.500 rmw= 119.294

Top of commit queue on: TBD

Input data additions/changes

  • No changes are expected to input data.
  • There will be new input data.
  • Input data will be updated.

Anticipated changes to regression tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:

New baselines are needed for the following tests due to changes in the variables noted above.
RegressionTests_wcoss2.intel.log

regional_control 041 failed in check_result
regional_control 041 failed in run_test
regional_control_qr 043 failed in check_result
regional_control_qr 043 failed in run_test
regional_decomp 045 failed in check_result
regional_decomp 045 failed in run_test
regional_2threads 046 failed in check_result
regional_2threads 046 failed in run_test
regional_noquilt 047 failed in check_result
regional_noquilt 047 failed in run_test
regional_netcdf_parallel 048 failed in check_result
regional_netcdf_parallel 048 failed in run_test
regional_2dwrtdecomp 049 failed in check_result
regional_2dwrtdecomp 049 failed in run_test
regional_wofs 050 failed in check_result
regional_wofs 050 failed in run_test
regional_spp_sppt_shum_skeb 052 failed in check_result
regional_spp_sppt_shum_skeb 052 failed in run_test
rrfs_smoke_conus13km_hrrr_warm 066 failed in check_result
rrfs_smoke_conus13km_hrrr_warm 066 failed in run_test
rrfs_smoke_conus13km_hrrr_warm_2threads 067 failed in check_result
rrfs_smoke_conus13km_hrrr_warm_2threads 067 failed in run_test
rrfs_conus13km_hrrr_warm 068 failed in check_result
rrfs_conus13km_hrrr_warm 068 failed in run_test
rrfs_smoke_conus13km_radar_tten_warm 069 failed in check_result
rrfs_smoke_conus13km_radar_tten_warm 069 failed in run_test
regional_control_faster 076 failed in check_result
regional_control_faster 076 failed in run_test
regional_spp_sppt_shum_skeb_dyn32_phy32 104 failed in check_result
regional_spp_sppt_shum_skeb_dyn32_phy32 104 failed in run_test
hafs_regional_atm 116 failed in check_result
hafs_regional_atm 116 failed in run_test
hafs_regional_atm_thompson_gfdlsf 117 failed in check_result
hafs_regional_atm_thompson_gfdlsf 117 failed in run_test
hafs_regional_atm_ocn 118 failed in check_result
hafs_regional_atm_ocn 118 failed in run_test
hafs_regional_atm_wav 119 failed in check_result
hafs_regional_atm_wav 119 failed in run_test
hafs_regional_atm_ocn_wav 120 failed in check_result
hafs_regional_atm_ocn_wav 120 failed in run_test
hafs_regional_1nest_atm 121 failed in check_result
hafs_regional_1nest_atm 121 failed in run_test
hafs_regional_telescopic_2nests_atm 122 failed in check_result
hafs_regional_telescopic_2nests_atm 122 failed in run_test
hafs_global_1nest_atm 123 failed in check_result
hafs_global_1nest_atm 123 failed in run_test
hafs_global_multiple_4nests_atm 124 failed in check_result
hafs_global_multiple_4nests_atm 124 failed in run_test
hafs_regional_specified_moving_1nest_atm 125 failed in check_result
hafs_regional_specified_moving_1nest_atm 125 failed in run_test
hafs_regional_storm_following_1nest_atm 126 failed in check_result
hafs_regional_storm_following_1nest_atm 126 failed in run_test
hafs_regional_storm_following_1nest_atm_ocn 127 failed in check_result
hafs_regional_storm_following_1nest_atm_ocn 127 failed in run_test
hafs_global_storm_following_1nest_atm 128 failed in check_result
hafs_global_storm_following_1nest_atm 128 failed in run_test
hafs_regional_storm_following_1nest_atm_ocn_wav 130 failed in check_result
hafs_regional_storm_following_1nest_atm_ocn_wav 130 failed in run_test
atmaero_control_p8 132 failed in run_test

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

Commit Queue Checklist:

  • Link PR's from all sub-components involved
  • Confirm reviews completed in sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne with both Intel/GNU compilers
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Sam Trahan complete testing on Hera and reported results in the comments below.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Hera
    • Orion
    • Jet
    • Gaea
    • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@SamuelTrahanNOAA
Copy link
Copy Markdown
Collaborator

SamuelTrahanNOAA commented May 10, 2023

I have tested this on Hera with the Gnu and Intel compilers.

All Gnu tests pass. This is expected, since the only change is to the Intel compilation options.

As for Intel, there are two tests that whose results change, which are not listed in the PR description: hafs_regional_docn_oisst and hafs_regional_docn. That's because @dkokron ran on WCOSS2, where those tests are disabled.

I can confirm all Intel tests pass if I regenerate just the baselines of these tests:

redo.conf

(Expand for test list.)
COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16,FV3_GFS_v15_thompson_mynn,FV3_GFS_v17_p8,FV3_GFS_v17_p8_rrtmgp,FV3_GFS_v15_thompson_mynn_lam3km,FV3_WoFS_v0,FV3_GFS_v17_p8_mynn -D32BIT=ON |     | fv3 |

RUN     | regional_control                                                                                                        |                                         | fv3 |
RUN     | regional_noquilt                                                                                                        | - jet.intel                             | fv3 |
RUN     | regional_netcdf_parallel                                                                                                | - acorn.intel                           | fv3 |
RUN     | regional_wofs                                                                                                           | - jet.intel                             | fv3 |

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_RAP_sfcdiff,FV3_HRRR,FV3_RRFS_v1beta,FV3_RRFS_v1nssl -D32BIT=ON      |                                         | fv3 |
RUN     | regional_spp_sppt_shum_skeb                                                                                             |                                         | fv3 |

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_RAP_sfcdiff,FV3_HRRR,FV3_RRFS_v1beta,FV3_RRFS_v1nssl -D32BIT=ON      |                                         | fv3 |
RUN     | rrfs_smoke_conus13km_hrrr_warm                                                                                          |                                         | fv3 |
RUN     | rrfs_conus13km_hrrr_warm                                                                                                |                                         | fv3 |
RUN     | rrfs_smoke_conus13km_radar_tten_warm                                                                                    |                                         | fv3 |
RUN     | rrfs_conus13km_hrrr_warm_restart_mismatch                                                                               |                                         | fv3 | rrfs_conus13km_hrrr_warm

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_GFS_v16,FV3_GFS_v15_thompson_mynn,FV3_GFS_v17_p8,FV3_GFS_v17_p8_rrtmgp,FV3_GFS_v15_thompson_mynn_lam3km,FV3_WoFS_v0 -D32BIT=ON -DFASTER=ON |                  | fv3 |
RUN     | regional_control_faster                                                                                                 |                                         | fv3 |

COMPILE | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_HRRR -D32BIT=ON -DCCPP_32BIT=ON                                                     |                | fv3 |
RUN     | regional_spp_sppt_shum_skeb_dyn32_phy32                                                                                 |                | fv3 |

COMPILE | -DAPP=HAFSW -DMOVING_NEST=ON -DCCPP_SUITES=FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_gfdlmp_tedmf_nonsst,FV3_HAFS_v1_thompson_tedmf_gfdlsf -D32BIT=ON |                | fv3 |
RUN     | hafs_regional_atm                                                                                                       |                                         | fv3 |
RUN     | hafs_regional_atm_thompson_gfdlsf                                                                                       |                                         | fv3 |
RUN     | hafs_regional_atm_ocn                                                                                                   |                                         | fv3 |
RUN     | hafs_regional_atm_wav                                                                                                   |                                         | fv3 |
RUN     | hafs_regional_atm_ocn_wav                                                                                               |                                         | fv3 |
RUN     | hafs_regional_1nest_atm                                                                                                 | - jet.intel                             | fv3 |
RUN     | hafs_regional_telescopic_2nests_atm                                                                                     | - jet.intel                             | fv3 |
RUN     | hafs_global_1nest_atm                                                                                                   | - jet.intel                             | fv3 |
RUN     | hafs_global_multiple_4nests_atm                                                                                         | - jet.intel                             | fv3 |
RUN     | hafs_regional_specified_moving_1nest_atm                                                                                | - jet.intel                             | fv3 |
RUN     | hafs_regional_storm_following_1nest_atm                                                                                 | - jet.intel                             | fv3 |
RUN     | hafs_regional_storm_following_1nest_atm_ocn                                                                             | - jet.intel                             | fv3 |
RUN     | hafs_global_storm_following_1nest_atm                                                                                   | - jet.intel                             | fv3 |

COMPILE | -DAPP=HAFSW -DMOVING_NEST=ON -DCCPP_SUITES=FV3_HAFS_v1_thompson_noahmp_nonsst,FV3_HAFS_v1_thompson_noahmp,FV3_HAFS_v1_thompson_nonsst,FV3_HAFS_v1_thompson,FV3_HAFS_v1_gfdlmp_tedmf_nonsst,FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_thompson_tedmf_gfdlsf -D32BIT=ON -DFASTER=ON |                | fv3 |
RUN     | hafs_regional_storm_following_1nest_atm_ocn_wav                                                                | - jet.intel                             | fv3 |

COMPILE | -DAPP=HAFS-ALL -DCCPP_SUITES=FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_gfdlmp_tedmf_nonsst -D32BIT=ON                        | - wcoss2.intel                          | fv3 |
RUN     | hafs_regional_docn                                                                                                      | - wcoss2.intel                          | fv3 |
RUN     | hafs_regional_docn_oisst                                                                                                | - wcoss2.intel                          | fv3 |

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented May 25, 2023

Is there anything I need to do to get this PR approved?

@SamuelTrahanNOAA
Copy link
Copy Markdown
Collaborator

SamuelTrahanNOAA commented May 25, 2023

According to the commit queue, your PR is scheduled for merge today. You should hear something soon from one of the code managers.

danblchange

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented May 25, 2023 via email

@SamuelTrahanNOAA
Copy link
Copy Markdown
Collaborator

SamuelTrahanNOAA commented May 25, 2023

For your PR, what matters is the components' pull requests, fv3atm and GFDL_atmos_cubed_sphere, which have already been approved. At the ufs-weather-model level (this PR) you'll need to wait for final testing before it can be approved and merged.

@jkbk2004
Copy link
Copy Markdown
Collaborator

Thanks for patience! #1718 is pretty big one and on-going. We expect it will be merged tomorrow morning. Then we can start working on this pr. Dependency has approval and enough pre-test is already done on wcoss2 and hera. All looks good to schedule this pr by tomorrow.

@jkbk2004
Copy link
Copy Markdown
Collaborator

@dkokron #1718 was merged. Once you sync up, we can start working on this pr.

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented May 31, 2023 via email

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

Jong, This is my first time doing this. I don't know what you mean by "sync up". Can you give me the list of commands? Dan

On Wed, May 31, 2023 at 4:20 PM JONG KIM @.**> wrote: @dkokron https://github.com/dkokron #1718 <#1718> was merged. Once you sync up, we can start working on this pr. — Reply to this email directly, view it on GitHub <#1743 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACODV2GJE5SBDSI6WTVRQ6TXI6Y25ANCNFSM6AAAAAAX3OEBLM . You are receiving this because you were mentioned.Message ID: @.

you need to sync your fork branches to stay up-to-date since authoritative repository moved forward with commits. Example steps to merge in the changes into you fork branch at weather model are

  1. Clone your fork and checkout branch that needs syncing:
    git clone https://github.com/JoeSmith-NOAA/ufs-weather-model.git ./fork cd fork
    git checkout feature/my_new_thing
  2. Add upstream info to your clone so it knows where to merge from. The term “upstream” refers to the authoritative rep
    ository from which the fork was created.
    git remote add upstream https://github.com/ufs-community/ufs-weather-model.git
  3. Fetch upstream information into clone:
    git fetch upstream
  4. Later on you can update your fork remote information by doing the following command:
    git remote update
  5. Merge upstream feature/other_new_thing into your branch:
    git merge upstream/feature/other_new_thing
  6. Resolve any conflicts and perform any needed “add”s or “commit”s for conflict resolution.
  7. Push the merged copy back up to your fork (origin):
    git push origin feature/my_new_thing
    You need to sync up at submodule level as well.

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 1, 2023 via email

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

  1. feature/my_new_thing

your fork feature branch. For this pr, it must be hafs-rangeCheck3d.- See basically the process is about merging in the change in head of authoritative develop branch to your fork branch.

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 1, 2023 via email

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

I understand what "feature/my_new_thing" means. I was asking about "feature/other_new_thing" Dan

On Wed, May 31, 2023, 9:26 PM JONG KIM @.> wrote: 1. feature/my_new_thing your fork feature branch. For this pr, it must be hafs-rangeCheck3d.- See basically the process is about merging in the change in head of authoritative develop branch to your fork branch. — Reply to this email directly, view it on GitHub <#1743 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACODV2HNCMNNYAIZ2AA53PDXI74TVANCNFSM6AAAAAAX3OEBLM . You are receiving this because you were mentioned.Message ID: @.>

Sorry! It means upstream branch you want to merge in to your fork. In our case, develop.

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 1, 2023 via email

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 1, 2023 via email

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

@dkokron The conflicts in the test logs can be resolved just by picking 'theirs'. Supposing you did (while on your feature branch)

git merge upstream/develop
git checkout --theirs tests/*.log

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

Right! its not necessary to commit your test log files. Attachments to pr description or conversation is good enough.

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 1, 2023 via email

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

@jkbk2004 We don't have separate hera gnu/intel logs any more. What's going on?

@jkbk2004
Copy link
Copy Markdown
Collaborator

jkbk2004 commented Jun 1, 2023

@jkbk2004 We don't have separate hera gnu/intel logs any more. What's going on?

Those are what Sam had run for this pr a while ago.

@SamuelTrahanNOAA
Copy link
Copy Markdown
Collaborator

Those are what Sam had run for this pr a while ago.

This PR does not add those two log files, according to the list of files changed.

https://github.com/ufs-community/ufs-weather-model/pull/1743/files

@BrianCurtis-NOAA BrianCurtis-NOAA removed the hera-BL Run Hera baseline creation label Jun 4, 2023
@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

All testing completed.

@FernandoAndrade-NOAA
Copy link
Copy Markdown
Collaborator

@dkokron, The fv3atm sub PR was merged in, could you go ahead and update the hash along with reverting the change in gitmodules? The correct hash is NOAA-EMC/ufsatm@86ba901

@dkokron
Copy link
Copy Markdown
Contributor Author

dkokron commented Jun 6, 2023 via email

@zach1221
Copy link
Copy Markdown
Collaborator

zach1221 commented Jun 6, 2023

@dkokron @DeniseWorthen
Similar to the fv3atm sub-pr we just need fv3atm branch updated to the authoritative name and url, example below.
image

I believe the below steps will update to the latest FV3 hash.
cd FV3
git checkout develop
cd ..
git add FV3
git commit -m "Update FV3 hash to latest version."
git push origin hafs-rangeCheck3d

@DeniseWorthen
Copy link
Copy Markdown
Collaborator

Please check the hashes. FV3 should be at 86ba901

@zach1221
Copy link
Copy Markdown
Collaborator

zach1221 commented Jun 6, 2023

Thanks, again @DeniseWorthen . It looks correct to me.

@zach1221 zach1221 requested a review from sadeghitabas June 6, 2023 18:17
@FernandoAndrade-NOAA FernandoAndrade-NOAA merged commit 571a561 into ufs-community:develop Jun 6, 2023
@FernandoAndrade-NOAA FernandoAndrade-NOAA mentioned this pull request Jun 6, 2023
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Baseline Updates Current baselines will be updated. jenkins-ci Jenkins CI: ORT build/test on docker container Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants