Skip to content

update#2

Merged
panll merged 1428 commits into
panll:developfrom
ufs-community:develop
Oct 17, 2022
Merged

update#2
panll merged 1428 commits into
panll:developfrom
ufs-community:develop

Conversation

@panll
Copy link
Copy Markdown
Owner

@panll panll commented Oct 17, 2022

  • Use this template to give a detailed message describing the change you want to make to the code.
    update

  • You may delete any sections labeled "optional".

  • If you are unclear on what should be written here, see https://github.com/wrf-model/WRF/wiki/Making-a-good-pull-request-message for some guidance.

  • The title of this pull request should be a brief summary (ideally less than 100 characters) of the changes included in this PR. Please also include the branch to which this PR is being issued.

  • Use the "Preview" tab to see what your PR will look like when you hit "Create pull request"

--- Delete this line and those above before hitting "Create pull request" ---

DESCRIPTION OF CHANGES:

One or more paragraphs describing the problem, solution, and required changes.

TESTS CONDUCTED:

Explicitly state what tests were run on these changes, or if any are still pending (for README or other text-only changes, just put "None required". Make note of the compilers used, the platform/machine, and other relevant details as necessary. For more complicated changes, or those resulting in scientific changes, please be explicit!

DEPENDENCIES:

Add any links to external PRs (e.g. regional_workflow and/or UFS PRs). For example:

  • NOAA-EMC/regional_workflow/pull/<pr_number>
  • NOAA-EMC/UFS_UTILS/pull/<pr_number>
  • ufs-community/ufs-weather-model/pull/<pr_number>

DOCUMENTATION:

If this PR is contributing new capabilities that need to be documented, please also include updates to the RST files (docs/UsersGuide/source) as supporting material.

ISSUE (optional):

If this PR is resolving or referencing one or more issues, in this repository or elewhere, list them here. For example, "Fixes issue mentioned in ufs-community#123" or "Related to bug in https://github.com/NOAA-EMC/other_repository/pull/63"

CONTRIBUTORS (optional):

If others have contributed to this work aside from the PR author, list them here

JeffBeck-NOAA and others added 30 commits July 9, 2021 14:35
…#533)

* Changes required to produce the diag_table file at forecast run time.

* Minor syntax changes.

* Syntax bug fix and update to header.

* Update create_diag_table_file.sh

* Update setup.sh

* Update setup.sh
* Remove echo from script

* Add path to result files

* Add a new WE2E test for inline post in nco mode
This reverts commit 45ae32e.

Commit was made to develop branch of authoritative repository in error;
settings have now been changed to disallow direct commits to develop
and release branches to avoid this problem in the future.
## DESCRIPTION OF CHANGES:
This PR (1) helps reduce code duplication and (2) removes the necessity to check for the machine the workflow is running on before attempting to create relative symlinks by introducing the new function `create_symlink_to_file` for creating symlinks to files (but not directories).  This function:
* Checks for the existence of a symlink's target before creating it.  If the target doesn't exist, it prints out an error message and exits.  If it does, it creates the symlink to the target file.  Currently, checking for the existence of a target is done in several places in the scripts.  This PR changes the code so that instead of performing this check themselves, the scripts call this new function to perform this test and, if the test succeeds, create the symlink. 
* Creates a relative symlink (i.e. a symlink whose target is specified via a relative path) only if requested via an input argument AND if the machine/OS/shell the workflow is running on supports relative symlinks.  The second half of this test (after the AND) is done implicitly via the use of the new experiment variable `RELATIVE_LINK_FLAG`.  This variable gets set during experiment generation to the flag that needs to be passed to the link command on the machine in order to create relative links.  For example, on Hera, `RELATIVE_LINK_FLAG` gets set to "--relative".  On machines that don't support creation of relative links, `RELATIVE_LINK_FLAG` gets set to a null string.

## TESTS CONDUCTED:
The following WE2E tests were conducted:
```
DOT_OR_USCORE
nco_ensemble
nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR
pregen_grid_orog_sfc_climo
suite_FV3_GFS_v15p2
suite_FV3_GFS_v16
```
All completed successfully except suite_FV3_GFS_v16.  The latter failed in the run_fcst task after hour 2, i.e. all the symlinks, etc were created properly for the forecast to start, so this failure was not due to changes in this PR and was preexisting (confirmed this by running the test with the develop branch).
* WE2E test for MET validation.

* Add a header to the config file.

* Update config.MET_validation.sh

* Change 'validation' to 'verification'

Co-authored-by: JeffBeck-NOAA <55201531+JeffBeck-NOAA@users.noreply.github.com>
 ## DESCRIPTION OF CHANGES:
This PR addresses an error that occurs at run time ```ModuleNotFoundError: No module named 'f90nml'```.
A similar error may appear in any of the tasks that use a conda environment, and could potentially list
`jinja2` instead.

The version of lmod on Hera was updated. The behavior of the `system` command has changed and now runs
only in a subshell. As an lmod-version-independent fix, the module will no longer activate a conda
environment, but will set the conda environment that should be activated through an environment
variable, then the script will activate any required conda environment.

Note: Users who do not run with Rocoto, but still use these modules, will have an extra step of
activating the conda environment manually.

 ## TESTS CONDUCTED:
Built and ran a single WE2E test (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2)
successfully on Hera with these modifications, as well as a WE2E test (DOT_OR_USCORE) on Orion as a
sanity check.
## DESCRIPTION OF CHANGES: 
Remove "refine_ratio = ${refine_ratio}" in scripts/exregional_make_orog.sh to fix filter_topo failure

## TESTS CONDUCTED: 

regional ESG grids/topography

## DEPENDENCIES:


## DOCUMENTATION:


## ISSUE (optional): 
Fixes issue mentioned in #546
Copy and link NEMS field dictionary file:

* Copy field dictionary file from tests/parm into exptdir
* Link field dictionary file into run_dir during fcst
## DESCRIPTION OF CHANGES: 
1. Add if condition in tests/run_experiments.sh so to get informative error message when MET and MET paths are not available on a machine.
2. Remove MET/MET+ paths in tests/baseline_configs/config.verification.sh

## TESTS CONDUCTED: 
A test run was conducted on Hera and has been finished successfully.  Another test was run on Jet, and it failed in the expected way.

## DEPENDENCIES:
To have MET verification run successfully, the observational data (e.g., CCPA, MRMS, NDAS) must be available.

## DOCUMENTATION:
N/A

## ISSUE (optional): 
This is a follow up PR to complete the previous one in https://github.com/NOAA-EMC/regional_workflow/pull/537

## CONTRIBUTORS (optional): 
@gsketefian contributed the revision.
…n Hera (#526)

* Modifications to scripts to allow the workflow to run to completion
with the GNU build on Hera.  The python/miniconda3 module must be
unloaded prior to running the executable.

* "module" is not the best name for a variable, changed to a_module

* Address comments:

- Clarify message about why modules are being unloaded
- Remove print message if nothing is unloaded to reduce clutter in log file
- Rename "module_to_unload" to "modules_to_unload"
- Change "a_module" to "module_to_unload" for clarity

* Clarify purpose of the unload_python.sh function.

* Add the "why"
…etcdf format from NOAA HPSS (#555)

## DESCRIPTION OF CHANGES: 
This PR adds the capability to fetch FV3GFS external model data in netcdf format from NOAA HPSS.  Main changes:
1) Modify workflow scripts where necessary to include stanzas for FV3GFS_FILE_FMT_ICS and FV3GFS_FMT_LBCS being set to "netcdf" (currently, only "nemsio" and "grib2" are valid values for these parameters).
2) Add a WE2E test for this capability (config.get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000.sh).

## TESTS CONDUCTED:
Successfully ran the following 3 WE2E tests (including the new one introduced here) on Hera:
1) get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100
2) get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
3) get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000

## ISSUE (optional):
This resolves Issue #470.
## DESCRIPTION OF CHANGES: 
This PR reorganizes the WE2E testing system so that it is easier to comprehend, use, and modify.  The major changes are as follows:

* Move the WE2E testing system from the `regional_workflow/tests` directory to `regional_workflow/tests/WE2E`.  This is because we anticipate other types of test in the testing system, e.g. unit tests, which would go under `regional_workflow/tests/unit`, etc.
* Move the WE2E test configuration files that were in `regional_workflow/tests/baseline_configs` to category subdirectories under `regional_workflow/tests/WE2E/test_configs`.  The category subdirectories thus far and the types of tests they contain are: 
  * `grids_extrn_mdls_suites_community`
    Tests in community mode of various combinations of the grid, external models for ICs and LBCs, and physics suites.
  * `grids_extrn_mdls_suites_nco`
    Tests in NCO mode of various combinations of the grid, external models for ICs and LBCs, and physics suites.
  * `release_SRW_v1`
    The graduate student test (GST) used for the UFS SRW App version 1 release.
  * `wflow_features`
    Test of workflow features, e.g. ability to set various parameters to user-specified values instead of using the defaults, ability to fetch external model files from different models and on different dates from NOAA-HPSS, etc.
* Rename some of the WE2E test configuration files to adhere to the naming convention used in each category subdirectory.
* Remove some of the WE2E test configuration files since they are almost-duplicates, e.g. they differ with respect to another test only in the cycle date/time used or the LBC specification interval.
* Changes to contents of test configuration files:
  * Rearrange the order in which experiment variables are specified in the test configuration files so that the predefined grid name and the physics suite are set first, then the external model info is set, then the cycle dates, then the forecast length and LBC update interval, finally followed by other parameters.
  * Remove redundant variable specifications in the test configuration files, e.g. remove `QUILTING="TRUE"` since `QUILTING` is already set to `"TRUE"` by default, remove `GRID_GEN_METHOD="ESGgrid"` since a predefined grid is already specified that in turn implies a value for `GRID_GEN_METHOD`.
  * Include a test purpose/description in each test configuration file.
  * Perform bug fixes in the WE2E configuration files, e.g. a test that was supposed to run on the `RRFS_CONUS_3km` grid was actually running on the `RRFS_CONUS_13km` grid.
* Remove the file `baselines_list.txt` since it it not used.
* Add new function in `get_WE2Etest_names_subdirs_descs.sh` that:
  * Searches subdirectories under the base directory in which the WE2E test configuration files are located (`regional_workflow/tests/WE2E/test_configs`) and returns a list of all available test names, the category subdirectories under the base directory in which they are located, the unique test IDs, and the test descriptions.
  * Creates (if requested) a comma-separated value (CSV) file containing this WE2E information that can be opened as a spreadsheet in Google Sheets.
* Rename `run_experiments.sh` to `run_WE2E_tests.sh`.
* In `run_WE2E_tests.sh`:
  * Include a detailed usage message.
  * Make sure that required arguments are provided on the command line.
  * Call the new function `get_WE2Etest_names_subdirs_descs` to get a full list of all available tests.  Then check the list of test names that the user wants to run to make sure all exist in the full list.
  * Run sanity checks on the user-specified list of tests to run, e.g. that a test is not repeated (either under the same name or under an alternate name).
* In `ush/bash_utils/filesys_cmds_vrfy.sh`, put the `local` attribute in front of variables that are supposed to be local.
* In `ush/bash_utils/is_element_of.sh`, add the `:-` at the end of `array_name_at` so that the function still works when the array passed in is empty.
* In `ush/config_defaults.sh`, edit comments and move groups of variables to more appropriate location in file.

## TESTS CONDUCTED: 
On Hera, ran the following tests so far:
* get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000
* inline_post
* specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS
* specify_RESTART_INTERVAL

All were successful.  Since it is expensive to run all the new WE2E tests, the remainder will be tested gradually and the results recorded in a google spreadsheet (for now; this will have to be automated at some point).

## DEPENDENCIES:
There will likely be a companion PR into `ufs-weather-app` that updates the hash of `regional_workflow`, but that can only be created after this PR is merged.

## DOCUMENTATION:
Detailed documentation for how the new WE2E testing system works is in the comments in the files `run_WE2E_tests.sh` and `get_WE2Etest_names_subdirs_descs.sh`.  Transferring these to RST files will take time and is likely better to do as part of a separate PR.
…etup.sh (#563)

* remove dummy variables and rearrange variables

* Add a defalut value to ppn_run_fcst

* Remove conditional statement
…d namelist; set pazi=-13.0 for the RRFS_NA_3km domain (#567)
* Update WE2E script for wcoss

* Modify path to external data on hera
* Moved mrms_pull_topofhour.py to ush/ rather than scripts/

* Updates to get the mrms_pull_topofhour.py script to run with the proper environment loaded. Fixed a tab issue in the py script as well.

* Move "set +u" to the place it is absolutely needed, add a "set -u" at the end to restore the unset variable flag

Co-authored-by: Michael Kavulich, Jr <kavulich@ucar.edu>
* Modify post files names.

* Change post executable name in ex-script.
* Update EMC_post hash to top of develop; way overdue!

* Use BUILD_ALWAYS flags in CMakeLists.txt so that code changes result in a rebuild

* Load rocoto in wflow_hera.env, since it may have been unloaded at the build step

* Point to required regional_workflow branch for testing; will update to proper hash once PR is merged

* Point to latest regional_workflow hash
michelleharrold and others added 29 commits August 25, 2022 13:57
* Mods to METplus conf files: TCDC specifications, correction to level specificatons in point-stat mean and prob files, and added functionality to make METplus output dirs in ex scripts.

* Updated comments in MET ex-scripts for creating output directories.

* Fixed minor formatting issue in exregional_run_gridstatvx.sh
* add paths to recent MET/METplus installations on Gaea

* change data staging directory to ncep_shared
* Build to exec directory instead of bin.

* Rename src to sorc

* Add separate building of rrfs components.

* Temporarily change bin directory from "exec" to "bin"

* Remove --rrfs option from test/build, print packages to build.

* Update hash of regional_workflow.

* Minor changes.

* Set default list for direct cmake build path.

* Separte clean,build and install steps.

* Bug fix remove vs clean

* Remove custom install step cause it is same as all target anyway.

* Update for EMC_EXEC_DIR removal from UPP.

* Use modulefiles of component instead of top-level modulefile.

* Make modulefile name search flexible.

* Remove GSI dependency of rrfs_utl assuming gsi is already built.

* Fallback to SRW modules when we cant find modules of sub-component.
* Construct var_defns components from dictionary.

* Bring back config_defaults.yaml

* Add support for sourcing yaml file into shell script.

* Remove newline for printing config, json config fix.

* Make QUILTING a sub-dictionary in predef_grids

* Reorganize config_defaults.yaml by task and feature.

* Bug fix with QUILTING=true.

* Structure a dictionary based on a template dictionary.

* Convert all WE2E config files to yaml.

* Take care of problematic chars when converting to shell string.

* Process only selected keys of config.

* Add symlinked yaml config files.

* Actually use yaml config files for WE2E tests.

* Delete all shell WE2E configs.

* Don't check for single quotes in test description.

* Make WE2E work with yaml configs.

* Make yaml default config format.

* Bug fix in run_WE2E script.

* Add utility to check validity of yaml config file.

* Add config utility interface in ush directory.

* Remove unused check_expt_config_vars script.

* Add description to default config.

* Reorganize source_config.

* Add XML as one of the config formats.

* Update custom_ESGgrid config.

* Bug fix due to update.

* Change ensemble seed.

* Change POST_OUTPUT group due to merge.

* Make xml and ini configs work.

* Maintain config structure down to var_defns.

* Add function to load structured shell config, put description under metadata

* Flatten dicts before importing env now that shell config is structured.

* Support python regex for selecting dict keys.

* Add capability of sourcing task specific portion of config file.

* Access var_defns via env variable.

* Make names of tasks consistent with ex- and j- job script names.

* Append pid to temp file.

* Prettify user config, don't use &quot; in xml texts.

* Compare timestamp of csv vs all files instead of directory.

* Fixes for some pylint suggestions.

* Convert new configs to yaml.

* Format python files with black (no functional change).

* More readable yaml/json formats by using more data types.
Only datetime type is now in quotes.

* More readable yaml config files for WE2E and default configs.

* Make config_defaults itself more readable.

* Correct pyyaml list indentation issue.

* Fix indentation in all config files.

* Use unquoted WTIME in config_defaults

* Cosmotic changes.

* Fix due to merge.

* Make __init__.py clearer.

* Fixes due to merge.

* Minor edits of comments.

* Remove wcoss_dell_p3 from workflow (#810)

* remove wcoss_dell_p3

* remove block for tide and gyre

* Replace deprecated NCAR python environment with conda on Cheyenne (#812)

* Fix issue on get_extrn_lbcs when FCST_LEN_HRS>=40 with netcdf (#814)

* activate b file on hpss for >40h

* add a new we2e test for fcst_len_hrs>40

* reduce fcst time for we2e

* Convert new test case to yaml.

* Fix formatting due to merge.

* Convert new test case to yaml.

* Fix unittest.

* Merge develop

* Remove exception logic from __init__.py

* Minor change to cmd concat.

* Make grid gen methods return dictionary, simplifis code a lot.

* Add a comment why we are suppressing yaml import exception.

* Minor change to beautify unittest output.

* Add status badge for functional tests.

* Reorder tasks in config_default and we2e test cases to match order in FV3LAM.xml

* Keep single quotes and newlines in we2e test description.

* Revert back to not rounding to 10 digits

Co-authored-by: Chan-Hoo.Jeon-NOAA <60152248+chan-hoo@users.noreply.github.com>
Co-authored-by: Michael Kavulich <kavulich@ucar.edu>
This commit represents the merger of the history of the old regional_workflow repository (https://github.com/ufs-community/regional_workflow)

See this repository's wiki for further information.
)

## DESCRIPTION OF CHANGES: 
 - Removes regional_workflow from Externals.cfg
 - Replaces uses of "HOMErrfs" with "SR_WX_APP_TOP_DIR"
 - Remove/replace other references to regional_workflow

### Type of change
- [ ] Bug fix

## TESTS CONDUCTED: 
All build tests passed on Cheyenne (gnu), Hera (intel), Jet (intel), Orion (intel)

WE2E tests pending but all run so far have succeeded; see PR on GitHub for full list of tests

## DEPENDENCIES:
None

## DOCUMENTATION:
In-line documentation has been updated; user documentation will need to be updated in a future PR.
* update hashes and module list

* update docs

* update hash of ufs weather model

* remove emc_exec_dir

* back crtm to 2.3.0

* back g2tmpl to 1.10.0

* edit comment for bin-dir
* updated container instructions

* GSI/RRFS formatting fix

* another GSI/RRFS formatting fix

* add info on binding file dirs

* minor edits

* update Container Run section

* fix headings

* fix typo

* minor updates

* change to release branch version

* update binding, sandbox

* add NOAACloud/miniconda3 instructions

* add table of .img locations

* update pre-installed container locations

* update .img locations & other minor changes

* add troubleshooting section, minor updates

* formatting, remove comments

* add JET info

* format JET info

* wording/minor edits

* update container name to develop

* switch to python workflow/yaml

* fix typos, ICS/LBCS file info

* minor fix

* add :orphan: tag to rst tables

* edit sandbox name

* wording

* typos

* add example stage-srw command

* remove regional_workflow reference, minor updates

* add orion/cron note

* fix orion/cron note

* fix orion/cron note

Co-authored-by: gspetro <gillian.s.petro@gmail.com>
* Add preamble script from global workflow.

* Call preamble script in j-jobs and ex-scripts

* Call preamble in other scripts.

* Make names of j-jobs and ex-scripts consistent.

* Working towards nco vars in table 1.

* Change default bin directory to exec

* Appen FATAL ERROR to print_err_msg_exit.

* Replace some cp, cd, mkdir calls with their corresponding _vrfy versions

* Add job and jobid to the job-card.

* Add cyc and subcyc to rocoto xml

* Add a j-job preamble script for setpdy.

* Add a j-job postamble as well.

* Define some Table 1 vars in setup.

* Remove unused SRC_DIR, and rename others

* Rename CYCLE_BASEDIR to COMIN_BASEDIR

* Create the NCO root directories in setup.

* Remove source machine file wrapper.

* Bug fix in job_preamble.

* Make make_ics/lbcs use DATA directory properly.

* Make run_fcst use DATA directory properly.

* Made run_post use DATA directory properly.

* Make make_grid use DATA properly (untested).

* Make make_sfc_climo use DATA properly (untested).

* Make make_orog use DATA properly (untested).

* Bug fix for none-nco mode.

* Don't pass arguments from j-jobs to ex-scripts.

* Make forecast and post-output go to COMOUT.

* Remove CYCLE_DIR and use COMIN instead.

* Bug fix for community mode.

* Append cyc to COMIN in NCO mode.

* Fix rocoto run_post dependency with run_fcst issue.

* Use OPSROOT instead of PTMP and STMP.

* Move nco vars in config_defaults.

* Move logdir location to COMROOT.

* Set all root directories to EXPTDIR in community mode.

* Use pgmout and pgmerr.

* Fix inline post.

* Make pgmout/err redirection work with community mode.

* Use print_err in get_obs_mrms.

* Add prep_step.

* Add post_step.

* Add dbn_alert to post-processed grib2 output.

* Download extrn files directly to COMIN.

* Make make_ics/lbcs directly output to COMIN.

* Change names of extrn_mdl_var_defns files.

* Name fixes for DO_ENSEMBLE=false, dyn/phy

* Don't create symlinks to grib2 files in NCO mode.

* Append rrfs to make_ics/lbcs output.

* Modify extrn_mdl_var_defns names.

* Move forecast output to DATA/RUN.PDY. This location
can be used to store output of other tasks as well.

* Move templates to parm.

* Fix for new parm location.

* Move metplus one level up.

* Fixes for community mode.

* Rename SCRIPTSDIR and JOBSDIR.

* Move all FIX** directories in to a fix/ directory.

* Make FIXrrfs be EXPTDIR for community mode.

* Symlink upp and ufs_utils parm files to top level parm directory.

* Remove UPP_DIR and UFS_UTILS_DIR.

* Define cycle with subcyc when it is non-zero.

* Don't delete COMIN_BASEDIR if it already exists.

* Disassociate NCO mode from pre-generated grid.

* Don't choose fix location based on RUN_ENVIR.

* Bug fix in make_lbcs.

* Add flag to symlink or copy fix files.

* Change slurm log file locations

* Minor fix for inline post in nco mode.

* Add unique workflow ID to avoid clashes between different runs, while
keeping the relation between different tasks, which PID can not do.

* Make verification tasks NCO complaint.

* Pass RUN_ENVIR to we2e script.

* Fixes for merge conflicts.

* Add versions for wcoss2.

* Fix symlinks.

* Minor changes.

* Move grid/orog/sfcc completion files to EXPTDIR/grid/orog etc.

* Output modified namelist file with seeds in current directory.

* Fixes for unittests.

* Bugfix wrf_io version

* Fix CI issue with bin locations.

* Allow NCO root directories to be set individually.

* Don't append workflow id in community mode.

* Add helper script to rename model e.g. rrfs->aqm

* Bug fixes and naming changes for consitency.

* Replace instances of USHrrfs etc with a generic USHdir etc.

* Add unittest for whole workflow now that the merge made it possible.

* Remove unused process_args utility.

* Remove hard coded paths from configs.

* Don't replace existing var value with None.

* Add config.nco to unittest.

* Fix for Orion issue.

* Fix default OPSROOT location in run_we2e.

* Modeify setup_we2e script to run fundamental tests on all machines.

* Fix conflicting ics/lbcs temp location by moving to DATA.

* Bug fix in load_modules taken from PR #353.

* Specify default shell instead of symlinking.

* Turn off grid/orog/sfc_climo tasks for NCO test cases.

* Use PDY and cyc in ex-scripts.

* Remove CDATE from xml and define int job_preamble.

* Use machine specific list of tests if available.

* Run all tests in community mode so that the last NCO test case
gets reported as finished.

* Minor changes

* Avoid using preamble in functions.

* Use preamble in function too.

* Turn on debugging for utility functions.

* Turn on debug & verbose in CI.

* Turn off set -e for init_env
* update lmod

* update lmod

* update hpc-stack and miniconda

* fix lmod-setup.sh bug for Gaea

* update files to run with new miniconda and MET VX

* fix typo

* fixed typo

* update vx task

* Update build_gaea_intel

The list of modules to be loaded needs updates.

* Update load_modules_run_task.sh

Fixed a typo

* Update load_modules_run_task.sh

* updated vx task

Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00061.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00062.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00063.pw-noaa-us-east-1.pw.local>
Co-authored-by: Parallel Works app-run user <Edward.Snyder@mgmt-edwardsnyder-pclusternoaav2-00064.pw-noaa-us-east-1.pw.local>
Co-authored-by: Natalie Perlin <68030316+natalie-perlin@users.noreply.github.com>
* update stochastic physics link

* add HPC-Stack Intersphinx links

* remove hpc-stack submodule in favor of intersphinx links

* move tables to 'tables' directory

* remove duplicate Glossary terms; update links

* update Quickstart

* add M. Leukin to mgmt team list; change expt gen command

* fix typo

* add stochastic physics link & troubleshooting tips

Co-authored-by: gspetro <gillian.s.petro@gmail.com>
…ests (#333)

* Normalize Parallel Works cluster platform value

Set the value of platform to 'noaacloud' when SRW_PLATFORM matches a
Parallel Works cluster name.

* Enable the AWS Parallel Works platform

* Move agent declaration to stages

This change allows the platform filter to work correctly, otherwise, the
Parallel Works clusters would block indefinitely waiting to execute the
matrix on a agent/node that was not started.

* Ensure PROJ_LIB is set on Parallel Works platforms

* Fix final exit status of srw_test script

Some platforms do not recognize quoted variables within an arithmetic
expression. This change removes the quotes.

* Add comprehensive end-to-end tests option

* Add a parameter to the Jenkins pipeline that allows the comprehensive
set of workflow and end-to-end tests to be executed during the test
stage.
* Add logic to the Jenkins pipeline that checks for a specific Pull
Request label, then overrides the comprehensive end-to-end test
parameter's value if set.

* Clean up the workspace after a we2e test run

The experiments directory uses a lot of disk space. Removing it after
the end-to-end tests complete will allow us to keep the workspaces
longer. However, the test logs should be preserved. This change creates
a tarball containing the test logs in the workspace, which is archived,
then removes the experiments directory.

* Disable concurrent builds for branches and PRs

Prevent Jenkins from executing multiple pipelines at the same time for a
given branch or change request.

* Disable branch indexing triggers for pipeline

* Update we2e fundamental tests for srw_test script

* Log and archive the output of the srw build

* Update we2e comprehensive tests in srw_test script

* Update we2e fundamental/comprehensive tests

* Update we2e comprehensive tests

* Remove invalid tests from comprehensive list

* Update Parallel Works cluster names

* Remove regional_workflow from we2e_test_dir path
* Make machine files yaml.

* Remove redundant SR_WX dir

* Remove some duplicate derived types.

* Convert constants to yaml.

* Bug fix GFDL grid.

* Bug fix machine lower/upper case.

* Fix unittest to capture exit code.

* Gaea lmod setup fix with tcsh.

* Add missing gaea commands.

* Remove obsolete module-setup scripts.

* Fix linux modulefile name.
* Temporary fix for Hera netcdf issue.

* Cheyenne fix.
)

* Fix @ issue on LOGDIR.

* Get rid of RUN_CMD_* specification in deactivate_tasks.

* Add TEST_ALT_* directories to all machines.

* Enforce config sourcing order in setup.

* Also fix DYN/PHY dir @ situation.
* Combine CI infrastructure, use get_expts_status etc to simplify Jenkins.

* Unload python in wflow_cheyenne just to be safe.

* Don't do launch_wflow when checking experiment status.

* Debug post test tar failures on Cheyenne

* Disable PW AWS cluster for pipeline

* Fake un-launched jobs as if they are in progress.

* Undo last commit.

* Increase cron interval to 5 mins.

* Also increase initial delay to 6 min.

* Fix variable expansion for the srw_test.sh call

* Updates for build artifact handling

* Update tar command to create cleaner artifact
* Update path of build log for s3 upload

* Remove lines used for debugging

* Remove custom test list.

* Add custom_ensemble_2mems_stoch.

* Undo unloading of python for Cheyenne.

* debug cheyenne python failures

* Add more debugging code for cheyenne.

* Decrease re-launch interval temporarily.

* Turn off check_var_valid.

* Add Lmod_init for Gaea.

* Clean up now that everything seems to work.

* Fix comment about the need for conda activation for Cheyenne.

* Print crontab content before/after deletion.

* Improve experiments directory clean up

Only remove the data directories to allow we2e cron jobs to complete
and clean up themselves correctly.

* Ensure we2e cron jobs execute again before cleanup

* Bug fix in build.sh.

Co-authored-by: Jesse McFarland <jesse@mcfarland.sh>
* Fix quoting/escaping in Jenkinsfile

* Export we2e comprehensive tests var
The SRW_WE2E_COMPREHENSIVE_TESTS variable was incorrectly set by a sh
block that proceeded the call to srw_test.sh. This issue was hidden
because Jenkins sets the parameters to environment variables, which are
accessible from sh blocks. However, the parameters do not seem to be set
as environment variables the first time a multi-branch pipeline is
initialized. This resulted in an unbound variable error when the
SRW_WE2E_COMPREHENSIVE_TESTS variable was accessed by the srw_test.sh
script. This update sets SRW_WE2E_COMPREHENSIVE_TESTS in the same sh
block as the srw_test.sh script call, while ensuring the WORKSPACE
variable isn't evaluated until the sh block.
* Add wcoss2

* move python to wflow_wcoss2

* Add version read for wcoss2 to devbuild.sh

* add run env files

* Add nprocs to run_cmd

* Add tests/WE2E to gitignore

* update run_cmd_fcst

* fix typo in load_modules_run_task

* update run_fcst script

* update met version

* add modulefiles for verification on wcoss2

* update script

* rename slurm_native_cmd to sched_native_cmd

* enable post_output_domain_name to use numbers only

* Add run.ver.machine file

* add run.ver.fn to devbuild.sh

Co-authored-by: chan-hoo jeon <chan-hoo.jeon@clogin02.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: chan-hoo jeon <chan-hoo.jeon@clogin01.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: chan-hoo <chan-hoo.jeon@clogin05.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: chan-hoo <chan-hoo.jeon@clogin03.cactus.wcoss2.ncep.noaa.gov>
* Bug fix for gaea modulefiles.

* Specify version number of minicond3.

* Do the same for noaacloud modulefiles.

* Use same miniconda in gaea task modulefiles.
* Remove ENV_INIT_SCRIPT/init_env and source /etc/profile in lmod-setup.

* Move TOPO_DIR and SFC_CLIMO_INPUT_DIR to appropriate sections.

* Bug fix for orion and wcoss2 machine files.

* Remove ENV_INIT from wcoss2 machine file.

* Modify setting of +u/-u.

* Do the same for set +e/-e

* Minor modification.

* Hack for gaea python3 loading.
…nce again. (#417)

* Update modulefiles/build_hera_intel and modulefiles/srw_common to allow the SRW to build and run on Hera following update to HPC-stack.

* Update modulefiles/build_jet_intel and modulefiles/build_orion_intel so that NetCDF will be loaded before nccmp.
* Deprecating CYCL_HRS

Changes were made to all config files and scripts to use FIRST and LAST
cycle definitions to accept the cycle HH, and frequency will start from
those for all relevant computation.

* Updating docs to remove ref to CYCL_HRS

* Remove CYCL_HRS from workflow.

* Add option to run all tests.

* Fixes needed to run WE2E tests.

* Fix the failed test.

* Make specification of groups of test more flexible.

* Addressing Mike's review comments.

* Addressing Gerard's comments.
…#412)

* Only run tests on specific branches.

* Removing deprecated build.yml
@panll panll merged commit ecc19f5 into panll:develop Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.