Skip to content

[develop] Source task specific portions of var_defns.sh file in ex- and j-job s…#465

Merged
MichaelLueken merged 14 commits into
ufs-community:developfrom
danielabdi-noaa:feature/partial_var_defns
Nov 16, 2022
Merged

[develop] Source task specific portions of var_defns.sh file in ex- and j-job s…#465
MichaelLueken merged 14 commits into
ufs-community:developfrom
danielabdi-noaa:feature/partial_var_defns

Conversation

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa danielabdi-noaa commented Nov 10, 2022

DESCRIPTION OF CHANGES:

Adds capability to source task specific sections of var_defns.sh file in ex-scripts and j-jobs. Each task sources a subset of the variables in var_defns.sh, one or more task_* sections, along with common variables defined in non-task specific sections such as workflow: or global. This enables for example defining the same variable name e.g. NNODES, instead of NNODES_FOR_FCST, NNODES_FOR_POST etc.

Most of the tasks require definitions only in their own sections

source_config_for_task "task_make_ics" ${GLOBAL_VAR_DEFNS_FP}

Some like the make_orog task use variables from make_grid as well (usually previous task)

source_config_for_task "task_make_orog|task_make_grid" ${GLOBAL_VAR_DEFNS_FP}

Additional changes:

  • Complete set of task modulefiles for tier-1 platforms, and python-3 loading on all systems just to be safe. This PR requires python3 for all tasks.

  • Rearrange config entries, rename some such as TOPO_DIR->FIXorg and SFC_CLIMO_INPUT_DIR->FIXsfc

  • machine files now contain entirely of platform: section

  • Removes unused NOMADS script and logic and rely on retrieve_data.py's nomads support

  • Option to set current date as DATE_FIRST/LAST_CYCL using date util. This is useful, for example, for the NOMADS test case which only has data for the last 7 or so days

  • Forward arguments from setup_WE2E_tests.sh to run_WE2E_tests.sh. This makes things like the following possible which
    before this PR are not possible. It basically makes setup_WE2E script a convenience wrapper.

    Setup we2e run with debug/verbose turned off (by default it is on)
    ./setup_WE2E_tests.sh hera zrtrr intel custom debug=FALSE verbose=FALSE

    Setup we2e run without using cron (i.e. just generate the directories)
    ./setup_WE2E_tests.sh hera zrtrr intel custom use_cron_to_relaunch=FALSE

    Setup we2e run using exes from some directory
    ./setup_WE2E_tests.sh hera zrtrr intel custom exec_subdir=old_install/exec

    etc

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@christinaholtNOAA

@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Nov 10, 2022
@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Nov 10, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Nov 11, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Nov 11, 2022
@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch 2 times, most recently from 913de18 to 1692184 Compare November 11, 2022 10:49
@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch from 1692184 to c0d6ee1 Compare November 11, 2022 11:22
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Nov 11, 2022
@venitahagerty venitahagerty removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel ci-hera-intel-WE Kicks off automated workflow test on hera with intel labels Nov 11, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Nov 11, 2022

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1118438314/20221111113516/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Succeeded on hera: community_ensemble_2mems_stoch
Experiment Succeeded on hera: pregen_grid_orog_sfc_climo
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
Experiment Succeeded on hera: MET_ensemble_verification
All experiments completed

@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Nov 11, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1118438314/20221111113515/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: custom_ESGgrid
Experiment Succeeded on jet: specify_RESTART_INTERVAL
Experiment Succeeded on jet: custom_GFDLgrid
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: specify_DOT_OR_USCORE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
All experiments completed

@JeffBeck-NOAA JeffBeck-NOAA changed the title Source task specific portions of var_defns.sh file in ex- and j-job s… [develop] Source task specific portions of var_defns.sh file in ex- and j-job s… Nov 11, 2022
Copy link
Copy Markdown
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa These changes look good to me! Moved SYMLINK_FIX_FILES from task_run_fcst: to workflow: in config_defaults.yaml, nice!

I manually tested your changes on Cheyenne. All of the Intel tests successfully passed.
However, only a single test passed - grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16, for GNU tests. The rest failed in make_grid with the following error message:

At line 65 of file /glade/scratch/mlueken/ufs-srweather-app/sorc/UFS_UTILS/sorc/grid_tools.fd/regional_esg_grid.fd/regional_esg_grid.f90 (unit = 10, file = '/glade/scratch/mlueken/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR/grid/tmp/regional_grid.nml')
Fortran runtime error: Cannot match namelist object name .0

Have you encountered this with your testing? My experiment directories on Cheyenne for GNU are /glade/scratch/mlueken/expt_dirs.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@MichaelLueken Thanks for testing! So far I have only run it on Hera and Jet. Unfortunately I do not have access to Cheyenne but it is indeed odd that the Intel tests passed but not the GNU ones. I will re-run the cheynne GNU tests on other systems and see if it can be reproduced.

@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch from cadbc21 to f89f648 Compare November 11, 2022 23:56
@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch from 14f4b1f to cbf822e Compare November 13, 2022 21:13
@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch from 59500c7 to ce4330c Compare November 14, 2022 13:37
@MichaelLueken
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa I was able to run the GNU tests on Cheyenne using the Intel executables. All of the tests successfully passed. To double check, I cloned the authoritative repo, built it using GNU and submitted the fundamental tests. All tests successfully passed.

I created a new GNU modulefile on Hera. It can be found:

/scratch1/NCEPDEV/nems/Michael.Lueken/ufs-srweather-app/modulefiles/build_hera_gnu.lua

I'm finding the same behavior that I was on Cheyenne with GNU compilers:

At line 65 of file /scratch1/NCEPDEV/nems/Michael.Lueken/ufs-srweather-app/sorc/UFS_UTILS/sorc/grid_tools.fd/regional_esg_grid.fd/regional_esg_grid.f90 (unit = 10, file = '/scratch1/NCEPDEV/nems/Michael.Lueken/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR/grid/tmp/regional_grid.nml')
Fortran runtime error: Cannot match namelist object name .0

Please feel free to compile using the GNU modulefile on Hera to see if you can tell what might be happening with the GNU executables.

Also, I've been encountering the following error while attempting to run the WE2E tests since hash ce4330c:

  Reading in the test description for primary WE2E test:  "get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS"
  In category (subdirectory):  "wflow_features"

date: invalid date ‘PDYm1 ’
End run_WE2E_tests.sh at Mon Nov 14 18:19:39 UTC 2022 with error code 1 (time elapsed: 00:01:07)

Are you also encountering these failures?

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@MichaelLueken Thanks a lot for testing. I will try to compile on Hera with GNU and try to debug it. I also did run the cheyenne GNU tests on hera/jet and found no problem. Btw you may want to add the GNU modulefiles for Hera since it could be useful in the future for those of us who don't have access to cheyenne.

I will fix the PDYm1 issue. Thanks!

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@MichaelLueken I am able to reproduce the problem on Hera with GNU so I should be able to figure out the problem.

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

danielabdi-noaa commented Nov 15, 2022

@MichaelLueken The bug is fixed and I was able to run the cheyenne gnu tests on Hera with a gnu build to completion. The problem was regional_grid.nml file has

lx = -231.0
ly = -143.0

instead of

lx = -231
ly = -143

In the code I assumed that str.isnumeric() will work for negative numbers. The intel build either was able to work with the floats just fine, or the regional_grid.nml file produced integer entries somehow. In any case, it seems to work now.

@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch 2 times, most recently from 3de3627 to 4580fec Compare November 15, 2022 02:28
@danielabdi-noaa danielabdi-noaa force-pushed the feature/partial_var_defns branch from 4580fec to 7211af4 Compare November 15, 2022 04:28
Copy link
Copy Markdown
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa I was able to run the GNU tests on Cheyenne and they successfully ran. I will go ahead and give my approval to your changes and submit the Jenkins CI tests at this time (the Jet tests will launch once Jet has been returned to service).

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Nov 15, 2022
Copy link
Copy Markdown
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice additions here, @danielabdi-noaa !!

# Set NCO mode OPSROOT
#
OPSROOT=\"${OPSROOT}\""
OPSROOT=\"${opsroot}\""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does OPSROOT get set now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both expt_basedir and OPSROOT have default values specified in setup.py so that they work in the case where we use generate_FV3LAM_wflow.py directly. Just wanted to be consistent with expt_basedir

DATE_LAST_CYCL: '2020082600'
PREDEF_GRID_NAME: RRFS_CONUS_25km
DATE_FIRST_CYCL: date --utc --date="2 days ago" +%Y%m%d00
DATE_LAST_CYCL: date --utc --date="2 days ago" +%Y%m%d00
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted to handle this in the python-workflow but it ended up to be unnecessarily complicated so left it to WE2E scripts as a temporary solution. I think once the run_WE2E scripts are converted to python, templating the WE2E config files themselves with jinja2 may have some benefits.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on that solution now!! (Templating with jinja2 in the config files)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants