Skip to content

Enable running regional nesting configurations and add four regional and global nesting RTs as well as two HAFS WW3 coupling RTs#846

Merged
junwang-noaa merged 31 commits into
ufs-community:developfrom
hafs-community:feature/regional_nest
Nov 2, 2021
Merged

Enable running regional nesting configurations and add four regional and global nesting RTs as well as two HAFS WW3 coupling RTs#846
junwang-noaa merged 31 commits into
ufs-community:developfrom
hafs-community:feature/regional_nest

Conversation

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor

@BinLiu-NOAA BinLiu-NOAA commented Oct 4, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

Provide a detailed description of what this PR does. What bug does it fix, or what feature does it add? Is a change of answers expected from this PR? Are any library updates included in this PR (modulefiles etc.)?

  • This PR updates submodule FV3 that fixes and enables running the regional nesting configuration under the ufs-weather-model framework.
  • It also includes some bug fixes (from @JosephMouallem) in FV3/atmos_cubed_sphere for running the regional-nesting configuration (which has already been merged).
  • The following four HAFS regional and global nesting regression tests were added:
    • hafs_regional_1nest_atm
    • hafs_regional_telescopic_2nests_atm
    • hafs_global_1nest_atm
    • hafs_global_multiple_4nests_atm
  • Use atparse to parse ww3_multi.inp.IN and add the following two HAFS WW3 coupling regression tests (by @BinLiu-NOAA, @JessicaMeixner-NOAA, @danrosen25, @uturuncoglu, @aliabdolali):
    • hafs_regional_atm_wav
    • hafs_regional_atm_ocn_wav
  • Meanwhile, this PR also unifies/simplifies the existing HAFS RTs for both scripts and input data, as well as changes fhcyc from 24 for 0 for the existing hafs RTs, which means it also changes the results for these existing HAFS RTs.

Notes:

  • As for the input data change, we need to add the following two dirs please stage the following input data dir to the proper input data dir on all ufs-weather-model supported platforms:
    /work/noaa/nems/bliu/RT/NEMSfv3gfs/input-data-20210930/FV3_hafs_input_data
    /work/noaa/nems/bliu/RT/NEMSfv3gfs/input-data-20210930/WW3_input_data_20211101
    into the corresponding input-data-20210930 dir, e.g.
    /work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210930/
  • Next time when preparing a newer input-data dir, please delete the FV3_hafs_regional_input_data dir as well as this FV3_hafs_input_data/field_table file, which are no longer needed.
  • Need to create new baselines since it changes existing HAFS RT results plus there are newly added RTs.

Issue(s) addressed

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

Regression tests, including both existing and new ones, went through successfully on Orion.

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • CI

Dependencies

@BinLiu-NOAA BinLiu-NOAA added enhancement New feature or request Baseline Updates Current baselines will be updated. input data change labels Oct 4, 2021
  - hafs_regional_1nest_atm
  - hafs_regional_telescopic_2nests_atm
  - hafs_global_1nest_atm
  - hafs_global_multiple_4nests_atm
* Unify and simplify the existing HAFS regression tests for both scripts and input data.
…c files

for the hafs_regional_1nest_atm RT to speed up the write grid component.
…_hafs.nml.IN.

*Change FHCYC from 24 to 0 for hafs RTs.
@BinLiu-NOAA BinLiu-NOAA changed the title Enable running regional nesting and add regional and global nesting regression tests Enable running regional nesting configurations and add regional and global nesting regression tests Oct 4, 2021
@BinLiu-NOAA BinLiu-NOAA changed the title Enable running regional nesting configurations and add regional and global nesting regression tests Enable running regional nesting configurations and add four regional and global nesting RTs as well as two HAFS WW3 coupling RTs Oct 30, 2021
@BinLiu-NOAA BinLiu-NOAA force-pushed the feature/regional_nest branch from 9d22bc9 to 5738871 Compare October 30, 2021 04:01
@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

I have synced up with the latest develop branch. Appreciate it if someone can help stage (transfer) the following new updated input data directory from Orion:
/work/noaa/nems/bliu/RT/NEMSfv3gfs/input-data-20211101
to the proper input data directories, for example,
/work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20211101
on all the ufs-weather-model regression test platforms.
After that, this PR is ready to generate the new baseline, and then to conduct regular RTs to compare against the new baseline. Thanks!

@BinLiu-NOAA BinLiu-NOAA added the Changes Existing Input Data Existing input data will be changed. A new input-data-YYYYMMDD directory must be created. label Oct 30, 2021
@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

@BinLiu-NOAA The RT failed on cray:

baseline dir = /gpfs/hps3/emc/nems/noscrub/emc.nemspara/RT/NEMSfv3gfs/develop-20211101/hafs_global_multiple_4nests_atm working dir = /gpfs/hps3/stmp/Jun.Wang/FV3_RT/rt_1849/hafs_global_multiple_4nests_atm Checking test 064 hafs_global_multiple_4nests_atm results .... Comparing atmf006.nc ............ALT CHECK......NOT OK Comparing sfcf006.nc ............ALT CHECK......OK

[Jun.Wang@v71a3 hafs_global_multiple_4nests_atm]$ pwd /gpfs/hps3/stmp/Jun.Wang/FV3_RT/rt_1849/hafs_global_multiple_4nests_atm [Jun.Wang@v71a3 hafs_global_multiple_4nests_atm]$ /gpfs/hps3/emc/global/noscrub/Jun.Wang/nems/vlab/20211031/test/ufs-weather-model/tests/compare_ncfile.py atmf006.nc /gpfs/hps3/emc/nems/noscrub/emc.nemspara/RT/NEMSfv3gfs/develop-20211101/hafs_global_multiple_4nests_atm/atmf006.nc pressfc is different nan

Would you please take a look at the run on cray?

@junwang-noaa, I looked at this hafs_global_multiple_4nests_atm RT on your wcoss_cray dir, the different nan values are probably partly related to the fact the output grid specified in this test case does not overlap with the first nested domain. This will not be the issue in the future, once we can write output grid for all the nested grids. So, I would suggest to just turn off this hafs_global_multiple_4nests_atm RT on wcoss_cray for now. I can try to fix this for the future PR when adding the HAFS debug/threading RTs if that sounds good to you as well.

@junwang-noaa
Copy link
Copy Markdown
Collaborator

@BinLiu-NOAA I checked with the code managers. Since you will work on fixing the issues, it is OK to turn off the 4 HAFS test on jet and the hafs_global_multiple_4nests_atm test on cray. Please create an issue on ufs-weather-model repo on fixing those tests. Meanwhile please update your branch, we will finish the RT on jet and cray with the updated rt.conf, then commit the code after review. Thanks

…ce they

run out of wallclock limit.
*Turn off the hafs_global_multiple_4nests_atm RT on wcoss_cray, since the rerun
cannot bitwisely reproduce the baseline for the atmf006.nc file. This is partly
due to the fact the output grid domain specified for the write grid component
is not overlapped by nest02 (inside tile 2), but by nest03-05 inside tile 6.

Note: The above issues will be worked on and fixed in a follow-up PR.
@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

@BinLiu-NOAA I checked with the code managers. Since you will work on fixing the issues, it is OK to turn off the 4 HAFS test on jet and the hafs_global_multiple_4nests_atm test on cray. Please create an issue on ufs-weather-model repo on fixing those tests. Meanwhile please update your branch, we will finish the RT on jet and cray with the updated rt.conf, then commit the code after review. Thanks

Thanks, @junwang-noaa! I have made the following changes accordingly:

  • Turn off the four hafs regional/global nesting RTs on jet.intel, since they run out of wallclock limit.
  • Turn off the hafs_global_multiple_4nests_atm RT on wcoss_cray, since the rerun cannot bitwisely reproduce the baseline for the atmf006.nc file. This is partly due to the fact the output grid domain specified for the write grid component is not overlapped by nest02 (inside tile 2), but by nest03-05 inside tile 6.

And I will create the corresponding issues for these and will fix them in a follow-up PR when adding the HAFS related debug RTs.

Thanks!

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/748495333/20211101203007/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_290361
Test cpld_control_c192_p7 005 failed failed
Test cpld_control_c192_p7 005 failed in run_test failed
Test cpld_bmark_p7 009 failed failed
Test cpld_bmark_p7 009 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/748495333/20211101203007/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_290361
Test cpld_control_c192_p7 005 failed failed
Test cpld_control_c192_p7 005 failed in run_test failed
Test cpld_bmark_p7 009 failed failed
Test cpld_bmark_p7 009 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@junwang-noaa or @BrianCurtis-NOAA, I could not see any useful out/err info for these two failed tests. I copied over the two failed test run dirs and tried them from my own side through submitting the job_cards. And the two tests is currently running fine on Jet now:
/lfs4/HFIP/hwrfv3/Bin.Liu/FV3_RT/rt_290361/cpld_control_c192_p7/out
/lfs4/HFIP/hwrfv3/Bin.Liu/FV3_RT/rt_290361/cpld_bmark_p7/out

Bin

@BrianCurtis-NOAA
Copy link
Copy Markdown
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/748495333/20211102024511/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_9367
Cannot upload jet.intel RT LogIt is blocked by PR owner
Please obtain logs from /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/748495333/20211102024511/ufs-weather-model

@junwang-noaa
Copy link
Copy Markdown
Collaborator

@BinLiu-NOAA the Cray log file is at:
/scratch1/NCEPDEV/stmp2/Jun.Wang/wcosslog

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

@BinLiu-NOAA the Cray log file is at:
/scratch1/NCEPDEV/stmp2/Jun.Wang/wcosslog

@junwang-noaa, Hera is down for maintenance now. Could you please commit the wcoss_cray log directly from wcoss_cray back to this feature/regional_nest branch (you should already have write access to this feature branch)? Or you can point me directly to your wcoss_cray dir.

Thanks!

Bin

@junwang-noaa
Copy link
Copy Markdown
Collaborator

junwang-noaa commented Nov 2, 2021

@BinLiu-NOAA I still can't commit to your branch. The cray log file is at:
/gpfs/hps3/emc/global/noscrub/Jun.Wang/nems/vlab/20211031/test/ufs-weather-model/tests/RegressionTests_wcoss_cray.log

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

@BinLiu-NOAA I still can't commit to your branch. The cray log file is at:
/gpfs/hps3/emc/global/noscrub/Jun.Wang/nems/vlab/20211031/test/ufs-weather-model/tests/RegressionTests_wcoss_cray.log

RT log for wcoss_cray now is also updated.

Copy link
Copy Markdown
Collaborator

@DeniseWorthen DeniseWorthen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing the edit_inputs and switching everything to atparse.

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

Thanks for removing the edit_inputs and switching everything to atparse.

@JessicaMeixner-NOAA helped to remove edit_inputs and unified to use atparse for ww3_multi.inp.IN for all WW3 related RTs. Thanks!

@junwang-noaa
Copy link
Copy Markdown
Collaborator

Thanks, @JessicaMeixner-NOAA !
@BinLiu-NOAA The fv3atm PR is merged, would you please your branch to point to fv3 noaa-emc repo?

@BinLiu-NOAA
Copy link
Copy Markdown
Contributor Author

@BinLiu-NOAA The fv3atm PR is merged, would you please your branch to point to fv3 noaa-emc repo?

Done! Thanks!

@junwang-noaa junwang-noaa merged commit 5a548b9 into ufs-community:develop Nov 2, 2021
@BinLiu-NOAA BinLiu-NOAA deleted the feature/regional_nest branch July 3, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Baseline Updates Current baselines will be updated. Changes Existing Input Data Existing input data will be changed. A new input-data-YYYYMMDD directory must be created. enhancement New feature or request Waiting for Reviews The PR is waiting for reviews from associated component PR's.

Projects

None yet

7 participants