Skip to content

Convert machine scripts to yaml format.#388

Merged
danielabdi-noaa merged 11 commits into
ufs-community:developfrom
danielabdi-noaa:feature/machine_cfg
Oct 4, 2022
Merged

Convert machine scripts to yaml format.#388
danielabdi-noaa merged 11 commits into
ufs-community:developfrom
danielabdi-noaa:feature/machine_cfg

Conversation

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa danielabdi-noaa commented Sep 30, 2022

DESCRIPTION OF CHANGES:

This PR converts machine files into yaml format, and then use them to update config_defaults.yaml once for reasons explained in issue #386. The final result is written to var_defns.sh so there won't be a need to source machine file in many ex- and other scripts.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

None

ISSUE:

#386

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@christinaholtNOAA

@danielabdi-noaa danielabdi-noaa changed the title Make machine files yaml. Convert machine scripts to yaml format. Sep 30, 2022
@danielabdi-noaa danielabdi-noaa added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Sep 30, 2022
@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Sep 30, 2022
@venitahagerty
Copy link
Copy Markdown
Collaborator

venitahagerty commented Sep 30, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1073258097/20220930222016/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR

@ufs-community ufs-community deleted a comment from venitahagerty Sep 30, 2022
@danielabdi-noaa danielabdi-noaa added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Sep 30, 2022
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Oct 1, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 1, 2022
@danielabdi-noaa danielabdi-noaa added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Oct 1, 2022
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Oct 1, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 1, 2022
@danielabdi-noaa danielabdi-noaa added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Oct 1, 2022
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Oct 1, 2022
@ufs-community ufs-community deleted a comment from venitahagerty Oct 1, 2022
@MichaelLueken
Copy link
Copy Markdown
Collaborator

@danielabdi-noaa While attempting to run the manual tests on Hera, I'm encountering the following failure:

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/Michael.Lueken/ufs-srweather-app/ush/calculate_cost.py", line 91, in <module>
    params = calculate_cost(args.cfg)
  File "/scratch1/NCEPDEV/nems/Michael.Lueken/ufs-srweather-app/ush/calculate_cost.py", line 38, in calculate_cost
    grid_params = set_gridparams_GFDLgrid(
  File "/scratch1/NCEPDEV/nems/Michael.Lueken/ufs-srweather-app/ush/set_gridparams_GFDLgrid.py", line 255, in set_gridparams_GFDLgrid
    halo_width_on_t7g = NH4 + 1
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Have you seen similar issues while running the WE2E tests on this machine?

@dshawul
Copy link
Copy Markdown
Contributor

dshawul commented Oct 3, 2022

@MichaelLueken Sorry I introduced a bug in my last commit in the GFDL grid subroutine which I have fixed now.

Copy link
Copy Markdown
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Thank you very much for addressing my concern in running the fundamental WE2E tests manually. Another test was run on Hera which successfully completed. I approve of these changes.

@panll
Copy link
Copy Markdown
Collaborator

panll commented Oct 3, 2022

Testing error on Hera with default config.yaml (config.community.yaml). The code does not recognize "hera". Here is the error message when run ./generate_FV3LAM_wflow.py:
Generating the global experiment variable definitions file specified by
GLOBAL_VAR_DEFNS_FN:
GLOBAL_VAR_DEFNS_FN = "var_defns.sh"
Full path to this file is:
GLOBAL_VAR_DEFNS_FP = "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/expt_dirs/test_community/var_defns.sh"
For more detailed information, set DEBUG to "TRUE" in the experiment
configuration file ("config.yaml").
File "./generate_FV3LAM_wflow.py", line 1118, in
p.start()
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/context.py", line 276, in _Popen
return Popen(process_obj)
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
self.run()
File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "./generate_FV3LAM_wflow.py", line 1106, in workflow_func
generate_FV3LAM_wflow()
File "./generate_FV3LAM_wflow.py", line 114, in generate_FV3LAM_wflow
setup()
File "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/setup.py", line 2031, in setup
print_err_msg_exit(
File "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/python_utils/print_msg.py", line 19, in print_err_msg_exit
traceback.print_stack(file=sys.stderr)
FATAL ERROR:
The variable MACHINE=hera in config_defaults.yaml or config.yaml does not have
a valid value. Possible values are:
MACHINE = ['HERA', 'ORION', 'JET', 'ODIN', 'CHEYENNE', 'STAMPEDE', 'LINUX', 'MACOS', 'NOAACLOUD', 'SINGULARITY', 'GAEA']
Exiting with nonzero status.
rm: cannot remove ‘/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/tmp’: No such file or directory
File "./generate_FV3LAM_wflow.py", line 1135, in
rm_vrfy(tmp_fp)
File "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/python_utils/filesys_cmds_vrfy.py", line 33, in rm_vrfy
return cmd_vrfy("rm", *args)
File "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/python_utils/filesys_cmds_vrfy.py", line 20, in cmd_vrfy
print_err_msg_exit(f'System call "{cmd}" failed.')
File "/scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/python_utils/print_msg.py", line 19, in print_err_msg_exit
traceback.print_stack(file=sys.stderr)
FATAL ERROR: System call "rm /scratch2/BMC/fv3lam/lpan/test/review/10032022/test/ufs-srweather-app/ush/tmp" failed.
Exiting with nonzero status.
@danielabdi-noaa

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@panll There was a bug with upper/lower case machine name that doesn't happen in WE2E tests. I fixed it now, but note that if you re-built hera binaries in last couple of days, it may fail during forecast.

@panll
Copy link
Copy Markdown
Collaborator

panll commented Oct 3, 2022

@danielabdi-noaa it works now, thanks!

@danielabdi-noaa
Copy link
Copy Markdown
Collaborator Author

@panll Thanks for testing. Unittest should have captured this bug but it was not returning exit code so it wrongly thought the test passed. I would like to test this PR on all machines using Jenkins, which is now close to being ready after Cheyenne issues are fixed.

@danielabdi-noaa danielabdi-noaa merged commit cc96304 into ufs-community:develop Oct 4, 2022
Comment thread ush/config_defaults.yaml
DOMAIN_PREGEN_BASEDIR: ""
#
#-----------------------------------------------------------------------
# Scritps and commands needed by workflow and tasks
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "Scripts"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks i will fix it in the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants