Skip to content

Enable ensemble forecasts#245

Merged
JeffBeck-NOAA merged 16 commits into
ufs-community:developfrom
gsketefian:feature/ensemble
Jul 22, 2020
Merged

Enable ensemble forecasts#245
JeffBeck-NOAA merged 16 commits into
ufs-community:developfrom
gsketefian:feature/ensemble

Conversation

@gsketefian
Copy link
Copy Markdown
Collaborator

@gsketefian gsketefian commented Jun 29, 2020

Description of changes:

  • Introduce the new workflow variables DO_ENSEMBLE and NUM_ENS_MEMBERS. The user can enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE" and NUM_ENS_MEMBERS to the number of ensemble members to use. Note that NUM_ENS_MEMBERS also specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08. For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8, whereas if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08.
  • During the experiment generation step, generate the full list of cycle dates/times to run and create a directory for each cycle. Previously, the cycle directory for each cycle was created during the make_ics or make_lbcs task of that cycle (whichever ran first).
  • When running ensemble forecasts, create a set of ensemble member directories under each cycle directory and use those ensemble directories as the FV3 run directories. Note that these ensemble directories are created when the make_ics or make_lbcs task for that cycle and ensemble member runs (whichever of those two tasks happens to run first); they are not created during experiment generation, although that could be done as well.
  • Modify the ush/generate_FV3SAR_wflow.sh script and the ush/templates/FV3SAR_wflow.xml jinja2 template to add the capability to have more than one cycle within an experiment. This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
  • Add three new WE2E tests for running ensemble forecasts: two in community mode (community_ensemble_2mems and community_ensemble_008mems) and another in NCO mode (nco_ensemble).
    • The two community ensemble WE2E tests differ in that one of them uses NUM_ENS_MEMBERS="2" while the other uses NUM_ENS_MEMBERS="008". The first is the simplest case of two ensemble members with member directorieds named mem1 and mem2, while the second is meant to test the ability to add leading zeros to the digits in the member directory names (should get mem001, mem002, ..., mem008) and to check that the workflow doesn't run into problems with integers being interpreted in octal because they start with leading 0's (as can happen in bash).
    • In order to test the (restored) capability of the workflow to run multiple cycles (possibly on different days), set up the community ensemble tests to have two cycle hours per day (instead of just one) and different starting and ending cycle days (ending is one day later than starting). For the same reason, set up the nco ensemble test to have multiple cycle hours per day (but not different starting and ending days because only one day of data is available for this nco-mode test).

File-by-file description of modifications:

Modifications common to more than one file (used below in listing of file-by-file modifications):

(A) To clarify that the contents of the subdirectory in which the external model files are placed are not the ICs or LBCs (on the native grid) but external model files that will be used to generate the ICs or LBCs, change the name of that subdirectory from "ICS" or "LBCS" to "for_ICS" or "for_LBCS", respectively.
(B) Insert the new environment variable SLASH_ENSMEM_SUBDIR anywhere CYCLE_DIR appears. This variable is passed in by the rocoto XML. If not running ensemble forecasts, it is simply set to an empty string; if running ensembles, it is set to the string "/${ensmem_subdir}" where ${ensmem_subdir} is the subdirectory of the current ensemble member under the current cycle directory. This allows the subdirectories containing ICS, LBCS, and RESTART files to be placed directly under the current cycle directory when NOT running ensembles and for them to be placed under the current ensemble member directory (which is one level down from the current cycle directory) when running ensembles.
(C) For clarity, add new local variable run_dir that gets set to the run directory based on the current cycle and, if applicable, the ensemble member. Note that if not running ensembles, run_dir is identical to the cycle directory (cycle_dir or CYCLE_DIR).
(D) To follow convention used in which the arguments to a function are in lowercase (because they are local variables within the function itself), change the argument CYCLE_DIR of exregional_run_fcst() function to cycle_dir.
(E) If the call to the function set_FV3nml_sfc_climo_filenames() fails, call print_err_msg_exit to print out an error message and exit.
(F) If running ensemble forecasts, call the new function set_FV3nml_stoch_params() that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member that contains a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of the experiment directory.
(G) Call the new function create_diag_table_files (in the new file ush/create_diag_table_files.sh) to create the diagnostics table files.
(H) Rename the variable FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN to clarify that it specifies the name of the FV3 namelist file for the base physics suite (which is used to generate the namelist file specific to the user-specified physic suite). This is done to better distinguish this base namelist file from the base namelist file used to generate namelist files for the various ensemble members. (The name of the latter is specified in the new workflow variable FV3_NML_BASE_ENS_FN.)
(I) Introduce the new workflow variable FV3_NML_BASE_ENS_FN that specifies the name to use for the base FV3 namelist file from which to generate the namelist file for each ensemble member. This variable is not used if not running ensemble forecasts (i.e. if DO_ENSEMBLE is not set to "TRUE").
(J) Add the local variable dummy_run_dir that specifies the (dummy) directory with respect to which to set the relaive paths of the fixed files (i.e. those in the FIXam directory) in the FV3 namelist file. When running ensembles, this path is two levels up from the run directory; without ensembles, it is only one level up (as was originally the case).
(K) Edit informational and/or error messages to the user.
(L) Remove commented-out code.
(M) Edit comments.

File-by-file description of modifications:

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (A)

jobs/JREGIONAL_MAKE_ICS: (B)

jobs/JREGIONAL_MAKE_LBCS: (B)

jobs/JREGIONAL_RUN_FCST: (B), (C), (D), (M)

  • Pass in the environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR as arguments to the function exregional_run_fcst(). These are set by the rocoto XML.

jobs/JREGIONAL_RUN_POST: (B), (C) (M)

  • In NCO mode, change location where ensemble directories are created to be under the cycle directory instead of above it. Do this using the new environment variable SLASH_ENSMEM_SUBDIR. This change is analogous to the change to community mode described in (B).
  • In community mode, place the "postprd" subdirectory under the run directory instead of under CYCLE_DIR (since now, cycle directories are one level up if running ensembles and thus would be the incorrect place to create the "postprd" subdirectory).
  • Change the argument cycle_dir of the function exregional_run_post to run_dir since that's what we really want in this function. This is instead of passing in cycle_dir and then forming run_dir.
  • Create the new argument cdate to the function exregional_run_post() and pass in the environment variable CDATE for its value (this is instead of using CDATE directly in exregional_run_post()).

scripts/exregional_make_grid.sh: (E), (F), (G), (M)

scripts/exregional_make_ics.sh: (A)

scripts/exregional_make_lbcs.sh: (A)

scripts/exregional_run_fcst.sh: (C), (D), (K), (M)

  • Introduce new input arguments ensmem_indx and slash_ensmem_subdir that get set to the rocoto-specified environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR, respectively, in the call to this function in jobs/JREGIONAL_RUN_FCST.
  • Change cycle_dir to run_dir in most places to make the directory name more general. This is because the run directory will be the cycle directory only when not running ensembles. When running ensembles, the run directory will be one of the ensemble member directories, which will be one level down from the current cycle directory.
  • Fix typo where there is an extra "}" printed after $target in error messages.
  • If running ensemble forecasts, use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) when creating a link in the run directory to the FV3 namelist file for the current ensemble member. Note that these namelist files are cycle-independent and thus are created only once (during the experiment generation step).
  • Move the creation of diagnostics table files to a new function (in ush/create_diag_table_files.sh), and call that function during experiment generation (in ush/generate_FV3SAR_wflow.sh) instead of here in exregional_run_fcst.sh. We do this because the diagnostics table files depend only on the cycle, not the ensemble member. Thus, since we know the cycles to run at experiment generation time, we generate the diagnostics file for each cycle at that time and place each in its corresponding cycle directory.
  • If running ensembles, create symlinks in the run directory to the diagnostics table and model configure files in the cycle directory (which will be one level up from the run directory). We don't do this when NOT running ensembles because in that case, the run directory is the cycle directory (and these two files already exist in that directory; they are created during experiment generation time).

scripts/exregional_run_post.sh: (C), (L), (M)

  • Create the new input argument cdate [which gets set to the global variable CDATE in the call to this function, i.e. exregional_run_post(), in jobs/JREGIONAL_RUN_POST] and use it instead of the global variable CDATE.
  • Change the argument cycle_dir to run_dir since that's more useful in this function. Do this instead of passing in cycle_dir and then forming run_dir.
  • Make the local variables "POST_..." lowercase to follow the convention that local variables be in lower case.

tests/baseline_configs/config.community_ensemble_008mems.sh:

  • New workflow configuration file to perform WE2E test of ensemble forecasts in community mode with NUM_ENS_MEMBERS set to "008". This should result in eight workflow members with leading zeros in the digits in the names of the member directories, i.e. directories should be named mem001, mem002, ..., mem008. To test the newly (re)added workflow capability to run multiple cycles, this test also uses first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) that are one day apart (instead of both being the same day as in most other WE2E tests) and two cycle hours per day.

tests/baseline_configs/config.community_ensemble_2mems.sh:

  • New workflow configuration file to perform WE2E test of ensemble forecasts in community mode with NUM_ENS_MEMBERS set to "2". This should result in two workflow members without leading zeros in the digits in the names of the member directories, i.e. directories should be named mem1 and mem2. To test the newly (re)added workflow capability to run multiple cycles, this test also uses first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) that are one day apart (instead of both being the same day as in most other WE2E tests) and two cycle hours per day.

tests/baseline_configs/config.nco_ensemble.sh:

  • New workflow configuration file to perform WE2E test of ensemble forecasts in NCO mode with NUM_ENS_MEMBERS set to "2". This should result in two workflow members without leading zeros in the digits in the names of the member directories, i.e. directories should be named mem1 and mem2. To test the newly (re)added workflow capability to run multiple cycles, this test also uses two cycle hours per day ("12" and "18"). However, unlike the ensemble tests in community mode, it uses the same first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) because currently, the external model data for the next day (which would be 20190902) is not staged on hera.

tests/baselines_list.txt:

  • Add the new WE2E tests for running ensemble forecasts described above.

ush/config_defaults.sh: (H), (I)

  • Move the section for user-staged external model files parameters up to after the section that defines external model parameters.
  • Move the section on stochastic parameters to after the new section that specifies whether or not to run ensemble forecasts (i.e. the variables DO_ENSEMBLE and NUM_ENS_MEMBERS; see below).
  • Introduce the new workflow variable DO_ENSEMBLE that specifies whether or not to run ensemble forecasts. Enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE".
  • Introduce the new workflow variable NUM_ENS_MEMBERS that specifies the number of ensemble members. This variable also specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08. For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08. This variable is not used if DO_ENSEMBLE is not set to "TRUE".

ush/create_diag_table_files.sh:

  • New file that defines a function [create_diag_table_files()] that creates a diagnostics table file for each cycle date and places it in the corresponding cycle directory.

ush/create_model_config_files.sh:

  • New file that defines a function [create_model_config_files()] that creates a model configuration file for each cycle date and places it in the corresponding cycle directory.

ush/generate_FV3SAR_wflow.sh: (H), (J) (L), (M)

  • Add new ensemble-related parameters to the "settings" variable that is used to customize the jinja2 template for the rocoto XML file. These new parameters allow the resulting XML to loop over ensemble members, to name rocoto tasks and log files such that they contain the ensemble member name (and are thus unique within an experiment), and to pass to the j-jobs the subdirectory of the current ensemble member under the current cycle directory.
  • Change "settings" variable used to set parameters in the jinja template for the rocoto XML to add capability to have more than one cycle in the experiment. This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
  • Use the new workflow array variable ALL_CDATES (containing all the cycle dates/times to run) to create all the cycle directories. Previously, the cycle directories were created during the make_ics or make_lbcs task. Now, this must be done during the experiment generation step because now, the model configuration file(s) and possibly also the diagnostics table file(s) (if the MAKE_GRID_TN step is being skipped), which are cycle-dependent but ensemble-member-independent, are created and placed in the cycle directories during experiment generation.
  • Call the new function create_model_config_files() to create a model configuration file within each cycle directory.
  • If not running the MAKE_GRID_TN task, then do (E), (F), and (G).

ush/set_FV3nml_sfc_climo_filenames.sh: (J)

ush/set_FV3nml_stoch_params.sh:

  • New file that defines a function [set_FV3nml_stoch_params()] that, for each ensemble member, takes the base FV3 namelist file and generates from it a new FV3 namelist file containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of the experiment directory.

ush/set_cycle_dates.sh:

  • New file that defines a function [set_cycle_dates()] that sets all the cycle dates/times to run in the experiment.

ush/setup.sh: (H), (I), (V), (M)

  • Remove unneeded local variables YYYY_FIRST_CYCL, MM_FIRST_CYCL, DD_FIRST_CYCL, and HH_FIRST_CYCL.
  • Call the new function set_cycle_dates() to set the new worklow array variable ALL_CDATES containing all the cycle days/times to be run as part of the experiment.
  • Set the new workflow variable NUM_CYCLES (defined as the number of forecasts to run as part of the experiment) to the number of elements in ALL_CDATES.
  • Rename FCST_LEN_HRS_MAX to fcst_len_hrs_max since it is a local variable and thus (by convention) should be lower case.
  • Make sure that the new workflow variable DO_ENSEMBLE is set to a valid value.
  • Set the new workflow variable NDIGITS_ENSMEM_NAMES that specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08. Note that this is not a user-specifiable variable; it is obtained by counting the number of characters in NUM_ENS_MEMBERS. For example, if NUM_ENS_MEMBERS is set to "8", then NDIGITS_ENSMEM_NAMES will get set to "1" and the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then NDIGITS_ENSMEM_NAMES will get set to "2" and the member directory names will be mem01, mem02, ..., mem08.
  • Set the new workflow array variable ENSMEM_NAMES containing the names of the ensemble members. These are used to set the ensemble member subdirectory names.
  • Set the new workflow array variable FV3_NML_ENSMEM_FPS containing the full paths to the FV3 namelist files of the ensemble members.

ush/templates/FV3.input.yml:

  • Remove setting of consv_te to 1.0 for the FV3_CPT_v0 suite because it generates an FV3 runtime error that states that this variable needs to be set to 0 for all regional runs.

ush/templates/FV3SAR_wflow.xml: (L)

  • Modify jinja2 code to allow for multiple cycles to be run. This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
  • Bug fix - Change file name "make_grid_task_complete.txt" to "&MAKE_GRID_TN;_task_complete.txt" to make it changeable with the task name.
  • Place a jinja2-controlled metatask around all tasks starting with MAKE_ICS_TN that loops over all ensemble members if do_ensemble is set to TRUE (note that do_ensemble gets set to the workflow variable DO_ENSEMBLE in ush/generate_FV3SAR_wflow.sh). Make the print format of the loop index dependent on ndigits_ensmem_names (which in turn gets set to the workflow variable NDIGITS_ENSMEM_NAMES in ush/generate_FV3SAR_wflow.sh) to get the correct names for the ensemble member subdirectories (in terms of the number of leading zeros to use for the number portion of the subdirectory name).
  • For tasks that are within the metatask that loops over the ensemble members, add a string (uscore_ensmem_name) that identifies the ensemble member to task names, corresponding job names, and log file names. Note that this variable is set to an empty string if not running ensemble forecasts.
  • For tasks that are within the metatask that loops over the ensemble members, create the new environment variable SLASH_ENSMEM_SUBDIR that gets set to the jinja2 variable slash_ensmem_subdir, which in turn gets set (in ush/generate_FV3SAR_wflow.sh) to a null string if not running ensembles and to the string "/${name_of_ensemble_member}" when running ensembles, where ${name_of_ensemble_member} is the name of the current ensemble member.
  • In the RUN_FCST_TN task, create a new environment variable named ENSMEM_INDX and set it to the current value of the index used in the loop over ensemble members. This is passed via the j-job jobs/JREGIONAL_RUN_FCST to the ex-script scripts/exregional_run_fcst.sh, where, if DO_ENSEMBLE is set to "TRUE", it is used to create the symlink in the run directory to the FV3 namelist file of the current ensemble member (which is in the top level of the experiment directory). Note that this variable is set to an empty string if not running ensemble forecasts (in that case, it is not used in the ex-script).
  • In the dependencies section of the RUN_POST_TN task, add slash_ensmem_subdir to the paths of the dynf*.nc and phyf*.nc files (since when running ensemble forecasts, these files will be in the ensemble member directories under the cycle directories).

ush/valid_param_vals.sh:

  • Specify valid values for the new workflow variable DO_ENSEMBLE.

Summary of modifications:
------------------------
* Introduce the new workflow variables DO_ENSEMBLE and NUM_ENS_MEMBERS.  The user can enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE" and NUM_ENS_MEMBERS to the number of ensemble members to use.
* When running ensemble forecasts, create/insert a set of ensemble member directories and create the cycle directories under these member directories.  These ensemble member directories are placed at the directory level that cycle directory levels would be placed when not running ensemble forecasts.
* Regardless of whether or not ensembles are enabled, change location where external model files are staged so that they are not in the cycle directories but instead one (without ensembles) or two (with ensembles) directory levels up.  In the case with ensembles, this needs to be done so that the external model files are not duplicated within each ensemble member directory; they do not need to be because all ensemble members use the same external model files.  This is also done for the case without ensembles in order to minimize the difference in worklfow behavior between the with and without ensemble cases.  To make this change of location of external model files, the new workflow variable EXTRN_MDL_FILES_BASEDIR is introduced (it is not a user-specified variable but a secondary one).
* Add two new WE2E tests for running ensemble forecasts, one in community mode (community_ensemble) and another in NCO mode (nco_ensemble).

Modifications common to more than one file (used below in listing of file-by-file modifications):
------------------------------------------------------------------------------------------------
(A) Fix/add/delete comments and/or informational and/or error messages.
(B) Remove commented out code.
(C) Change location where external model files are staged so that they are not in the cycle directories (which are now underneath each ensemble member directory) but instead one level up.  This needs to be done so that the external model files are not duplicated within each ensemble member directory; they do not need to be because all ensemble members use the same external model files.
(D) Add a call to the new function set_FV3nml_stoch_params() that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of that ensemble member's directory (so that all cycles in that member directory can create symlinks to it).
(E) Rename the variable FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN to clarify that it specifies the name of the FV3 namelist file for the base physics suite (which is used to generate the namelist file specific to the user-specified physic suite).  This is done to better distinguish this base namelist file from the base namelist file used to generate namelist files for the various ensemble members.  (The name of the latter is specified in the new workflow variable FV3_NML_BASE_ENS_FN.)
(F) Introduce the new workflow variable FV3_NML_BASE_ENS_FN that specifies the name to use for the base FV3 namelist file from which to generate the namelist file for each ensemble member.  This variable is not used if not running ensemble forecasts (i.e. if DO_ENSEMBLE is not set to "TRUE").
(G) Add the local variable dummy_cyc_dir that specifies the (dummy) directory with respect to which to set the relaive paths of the fixed files (i.e. those in the FIXam directory) in the FV3 namelist file.  When running ensembles, this path is two levels up from the cycle directory; without ensembles, it is only one level up (as was originally the case).

File-by-file description of modifications:
-----------------------------------------

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (C)

jobs/JREGIONAL_RUN_FCST:
* Change CYCLE_DIR to cycle_dir since it is a local variable in this context (it is an argument to the script exregional_run_fcst.sh).

jobs/JREGIONAL_RUN_POST:
* For NCO mode, change directory in which output from the RUN_POST_TN task is stored such that if running ensemble forecasts, subdirectories are created under COMOUT_BASEDIR for each ensemble member.  This is done via the variable SLASH_ENSMEM_DIR, which is set to either "/mem$NN" where $NN is the member number (if running ensemble forecasts) or to a null string (if not running ensemble forecasts).  For community mode, the output from the post task is under CYCLE_DIR, which now gets set in the rocoto XML such that it is under an ensemble member directory (see below description of modifications to ush/templates/FV3SAR_wflow.xml).

modulefiles/tasks/hera/make_ics.local:
* Add wgrib2 (must have been removed by mistake?).

modulefiles/tasks/hera/make_lbcs.local:
* Add wgrib2 (must have been removed by mistake?).

scripts/exregional_make_grid.sh: (A), (D)

scripts/exregional_make_ics.sh: (C)

scripts/exregional_make_lbcs.sh: (C)

scripts/exregional_run_fcst.sh:
* Change CYCLE_DIR to cycle_dir since it is a local variable.
* Add a check such that if running ensemble forecasts, the symlink for the FV3 namelist file that must be present in the cycle directory points to the namelist file at the top level of the ensemble directory under which that cycle directory is located.

tests/baseline_configs/config.community_ensemble.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in community mode.

tests/baseline_configs/config.nco_ensemble.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in NCO mode.

tests/baselines_list.txt:
* Add two new WE2E tests for running ensemble forecasts, one in community mode (community_ensemble) and another in NCO mode (nco_ensemble).

ush/config_defaults.sh: (A), (E), (F)
* Introduce the new workflow variable DO_ENSEMBLE that specifies whether or not to run ensemble forecasts.  Enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE".
* Introduce the new workflow variable NUM_ENS_MEMBERS that specifies the number of ensemble members.  This variable is not used if DO_ENSEMBLE is not set to "TRUE".

ush/generate_FV3SAR_wflow.sh: (A), (B), (D), (E), (G)
* Add new ensemble-related parameters to the "settings" variable that is used to customize the jinja2 template for the rocoto XML file.  These new parameters allow the resulting XML to loop over ensemble members, rename rocoto tasks and log files such that they contain the member number (and are thus unique), and modify cycle directories so that they are member-specific.

ush/set_FV3nml_sfc_climo_filenames.sh: (G)

ush/set_FV3nml_stoch_params.sh
* File to define new function that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of that ensemble member's directory (so that all cycles in that member directory can create symlinks to it).

ush/setup.sh: (E), (F)
* Rename FCST_LEN_HRS_MAX to fcst_len_hrs_max since it is a local variable.
* Introduce the new workflow variable EXTRN_MDL_FILES_BASEDIR that specifies the base directory under which the external model files will be staged.  Under this directory, a subdirectory will be created for each external model (one for ICs, another for LBCs if different from the one for ICs), and under these, subdirectories will be created for each cycle in which to stage the files.  Note that EXTRN_MDL_FILES_BASEDIR is a secondary variable in the sense that it is not user-specifiable.
* Rename FV3_NML_BASE_FP to FV3_NML_BASE_SUITE_FP for the same reason as renaming of FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN (see (E) above).
* Create new workflow variable FV3_NML_BASE_ENS_FP that specifies the full path to the base FV3 namelist file from which the namelist files for the individual ensemble members are generated.
* Introduce the new workflow array variable ENS_MEMBER_DIRS.  If running ensemble forecasts, set its elements to the ensemble member directories immediately under the experiment directory.
* If running ensemble forecasts, create the ensemble directories specified in the new workflow array variable ENS_MEMBER_DIRS.
* Record new variables to the workflow variable definitions file.

ush/templates/FV3SAR_wflow.xml:
* Bug fix - Change file name "make_grid_task_complete.txt" to "&MAKE_GRID_TN;_task_complete.txt" to make it changeable with the task name.
* Remove CYCLE_BASEDIR as an environment variable from the GET_EXTRN_ICS_TN and GET_EXTRN_LBCS_TN tasks.  This variable is no longer needed because the external model files are now staged outside of the cycle directories (under EXTRN_MDL_FILES_BASEDIR).
* Place a jinja2-controlled metatask around all tasks starting with MAKE_ICS_TN that loops over all ensemble members if do_ensemble is set to TRUE.
* For tasks that are within the metatask that loops over the ensemble members, add a string (uscore_ensmem_name) that identifies the ensemble member to task names, corresponding job names, and log file names.  Note that this variable is set to an empty string if not running ensemble forecasts.
* For tasks that are within the metatask that loops over the ensemble members, add a string (slash_ensmem_dir) that inserts the ensemble member directory to the definition of CYCLE_DIR (since when running ensembles, the cycle directories are under the member directories).  Note that this variable is set to an empty string if not running ensemble forecasts.
* Set the ensemble index (ENSMEM_INDX) as an environment variable in the RUN_FCST_TN task.  This is needed in the ex-script exregional_run_fcst.sh to be able to the symlink in the cycle directory to the FV3 namelist file in the correct ensemble member directory.  Note that this variable is set to an empty string if not running ensemble forecasts (in that case, it is not used).
* Set the ensemble member subdirectory preceded by a slash (SLASH_ENSMEM_DIR) as an environment variable in the RUN_POST_TN task.  This is needed in NCO mode when setting the directory in which to place the output of UPP.  Note that this variable is set to an empty string if not running ensemble forecasts.

ush/valid_param_vals.sh:
* Specify valid values for the new workflow variable DO_ENSEMBLE.
…e set to 0 on any regional grid. Make this change for the FV3_CPT_v0 suite (which is the only one for which consv_te had been set to a nonzero value).
…es are named in different scripts.

The workflow generation scripts create ensemble directories named, e.g., mem1, mem2, ..., mem8, but the exregional_run_fcst.sh script assumes they are mem01, mem02, ..., mem08.  Make these consistent.  Now, the naming convention used depends on whether or not leading zeros are included in NUM_ENS_MEMBERS.  For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08.
@gsketefian
Copy link
Copy Markdown
Collaborator Author

I wanted to do a draft pull request until I got the test results in on hera, but for whatever reason github wouldn't let me. If anyone knows why that might be, please let me know.

@christinaholtNOAA
Copy link
Copy Markdown
Contributor

I think that just means that you add a tag like "Not Ready".

Copy link
Copy Markdown
Contributor

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through this one in great detail since it's meant to be a draft, but I do want to point out a controversial design choice earlier rather than later. Essentially, I think this one might be a deal-breaker.

It is VERY uncommon to see the output directory structure set up like this. Typically everything that is done for a given cycle is, and needs to be, housed under the cycle directory. That includes input data, ensemble member sub directories, control forecasts, analysis directories, etc. I understand your reasoning for not wanting to change the current default behavior, but the change to the output directory structure for the default cold start case is needed so that the rest of the workflow components can live neatly where they should, and so that it is in line with what the collaborators on this project expect as a standard.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

gsketefian commented Jun 29, 2020 via email

@gsketefian
Copy link
Copy Markdown
Collaborator Author

Of the 23 WE2E tests I ran, 20 passed, including the 3 new ensemble tests (community_ensemble_2mems, community_ensemble_008mems, nco_ensemble). The ones that failed are regional_003, regional_004, and regional_010. These also fail with the current develop branch with the same errors as this branch (feature/ensemble), so the failures most likely have nothing to do with this PR.

Obviously, I will have to redo the tests once I rearrange directories, but I just wanted to report here on the failed tests that also occur with the develop branch.

regional_003 and regional_004 are the only two that use the FV3_GSD_v0, and they fail with the same error (surface pressure becomes NaN). The exact error message is:

  !!! (1) Error in subr radiation_aerosols: unrealistic surface pressure =
           1                     NaN

@JeffBeck-NOAA I tried reducing dt_atmos from 300 sec to 40 sec, but that made no difference. I think the crash happens before the integration even starts.

regional_010 fails with this namelist read error:

FATAL from PE     3: The global energy fixer cannot be used on a regional grid. consv_te must be set to 0.

So I tried removing the line consv_te: 1.0 from FV3.input.yml (that change is part of this PR). Now I get the following namelist read error.

forrtl: severe (19): invalid reference to variable in NAMELIST input, unit -5, file Internal Formatted NML Read, line -1, position 13

Please let me know if anyone tests the develop branch and also gets these failures and/or knows how to fix them. Thanks.

@gsketefian gsketefian added Not ready For PRs that should not be merged yet and removed testing on hera labels Jun 30, 2020
…neath the cycle directories (instead of the opposite). Details below.

Summary of modifications:
------------------------
* Place cycle directories above ensemble member directories, i.e. each cycle directory will contain a full set of ensemble member subdirectories that are used as the run directories.  Previously, it was the other way around, i.e. each member directory contained all cycle subdirectories.
* Move the external model directories into each cycle directory (instead of being in their own directory called extrn_mdl_files under the main experiment directory).
* During the experiment generation step, generate the full list of cycle dates/times to run and create a directory for each cycle.
* Add capability to have more than one cycle.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* In order to test the capability of the workflow to run multiple cycles (possibly on different days), modify the WE2E tests community_ensemble_2mems and nco_ensemble so that there are two cycle hours per day.  Also, modify community_ensemble_2mems so that the starting and ending days are different (one day later).

Modifications common to more than one file (used below in listing of file-by-file modifications):
------------------------------------------------------------------------------------------------
(A) Change the location where external model files are stored to be under each cycle directory (instead of a separate directory specified by EXTRN_MDL_FILES_BASEDIR under the main experiment directory).
(B) Insert the new environment variable SLASH_ENSMEM_SUBDIR anywhere CYCLE_DIR appears.  This variable is passed in by the rocoto XML.  If not running ensemble forecasts, it is simply set to an empty string, and if running ensembles, it is set to the string "/${ensmem_subdir}" where ensmem_subdir is the subdirectory of the current ensemble member under the current cycle directory.  This allows the subdirectories containing ICS, LBCS, and RESTART files to be placed directly under the current cycle directory when NOT running ensembles and for them to be placed under the current ensemble member directory (which is one level down from the current cycle directory) when running ensembles.
(C) For clarity, add new local variable run_dir that gets set to the run directory based on the current cycle and, if applicable, the ensemble member.
(D) Call the new function create_diag_table_files (in the new file create_diag_table_files.sh) to create diagnostics table files.
(D) For correctness, rename the local variable dummy_cyc_dir to dummy_run_dir.
(E) Remove any use of EXTRN_MDL_FILES_BASEDIR since it is no longer needed as a workflow variable.
(F) Remove any use of ENS_MEMBER_DIRS since it is no longer needed as a workflow variable.
(V) Remove unused code.
(W) Edit informational and/or error messages.
(X) Remove trailing whitespace.
(Y) Remove commented out code.
(Z) Edit comments.

File-by-file description of modifications:
-----------------------------------------

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (A)

jobs/JREGIONAL_MAKE_ICS: (B)

jobs/JREGIONAL_MAKE_LBCS: (B)

jobs/JREGIONAL_RUN_FCST: (B), (C), (Z)
* Pass in ENSMEM_INDX and SLASH_ENSMEM_SUBDIR as arguments to the function exregional_run_fcst().

jobs/JREGIONAL_RUN_POST: (B), (Z)
* Create the new local variable run_dir in which to store the path to the run directory (for the current cycle and possibly ensemble member).
* In NCO mode, change location where ensemble directories are created to be under the cycle directory instead of above it (analogous change to NCO mode as is done in (B) for community mode).
* In community mode, place the postprd subdirectory under the run directory instead of under CYCLE_DIR (since now, cycle directories are one level up if running ensembles and thus would be the incorrect place to create postprd).
* Create the new argument cdate to the function exregional_run_post() and pass in the environment variable CDATE for its value (this is instead of using CDATE directly in exregional_run_post()).
* Change the argument cycle_dir of the function exregional_run_post to run_dir since that's what we really want in that function.  This is instead of passing in cycle_dir and then forming run_dir.

scripts/exregional_make_grid.sh: (D)

scripts/exregional_make_ics.sh: (A), (X)

scripts/exregional_make_lbcs.sh: (A), (X)

scripts/exregional_run_fcst.sh: (C), (W)
* Introduce new input arguments ensmem_indx and slash_ensmem_subdir that get set to the rocoto-specified environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR, respectively, in the call to this function in jobs/JREGIONAL_RUN_FCST.
* Change cycle_dir to run_dir in most places to make the directory name more general.  This is because the run directory will be the cycle directory only when not running ensembles.  When running ensembles, the run directory will be one of the ensemble member directories, which will be one level down from the current cycle directory.
* Fix typo where there is an extra "}" printed after $target in error messages.
* Use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) when creating a link in the run directory to the FV3 namelist file for the current ensemble member.  Note that these namelist files are cycle-independent and thus are created only once during the experiment generation step.
* Move the creation of diagnostics table files to a new function (in ush/create_diag_table_files.sh), and call that function during experiment generation (in ush/generate_FV3SAR_wflow.sh) instead of here in exregional_run_fcst.sh.  We do this because the diagnostics table files depend only on the cycle, not the ensemble member.  Thus, since we know the cycles to run at experiment generation time, we generate the diagnostics file for each cycle then and place each in its corresponding cycle directory.
* If running ensembles, create symlinks in the run directory to the diagnostics table and model configure files in the cycle directory (which will be one level up from the run directory).  We don't do this when NOT running ensembles because in that case, the run directory is the cycle directory (and these two files already exist in that directory; they are created during experiment generation time).

scripts/exregional_run_post.sh: (C), (Y)
* Create the new input argument cdate (which gets set to the global variable CDATE in the call to this function (exregional_run_post)) and use it instead of the global variable CDATE.
* Change the argument cycle_dir to run_dir since that's more useful in this function.  This is instead of passing in cycle_dir and then forming run_dir.
* Make the local variables "POST_..." lowercase to follow the convention that local variables be in lower case.

tests/baseline_configs/config.community_ensemble_2mems.sh:
* Modify settings in this test configuration so that the starting and ending days of the cycles are not the same and so that there are two cycle hours per day.  This is to have more thorough testing of the ensembles feature in community mode.

tests/baseline_configs/config.nco_ensemble.sh:
* Modify settings in this test configuration so that there are two cycle hours per day.  This is to have more thorough testing of the ensembles feature in NCO mode.

ush/create_diag_table_files.sh:
* New file containing a function that creates a diagnostics table file for each cycle date and places it in the corresponding cycle directory.

ush/create_model_config_files.sh:
* New file containing a function that creates a model configuration file for each cycle date and places it in the corresponding cycle directory.

ush/set_cycle_dates.sh:
* New function that sets all the cycle dates/times to run.

ush/generate_FV3SAR_wflow.sh: (D)
* For clarity and consistency with other scripts, change variable name from slash_ensmem_dir to slash_ensmem_subdir.
* Change "settings" variable used to set parameters in the jinja template for the rocoto XML to add capability to have more than one cycle.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Use the new workflow array variable ALL_CDATES (containing all the cycle dates/times to run) to create all the cycle directories.  Previously, the cycle directories were created during the make_ics or make_lbcs task, but it is clearer to do it during experiment generation.  Also, it now must be done during experiment generation because now, the model configuration file(s) and possibly also the diagnostics table file(s) (if the MAKE_GRID_TN step is being skipped), which are cycle-dependent but ensemble-member-independent, are created and placed in the cycle directories during experiment generation.
* Call the new function create_model_config_files() to create a model configuration file within each cycle directory.
* If not running the MAKE_GRID_TN task, call the new function create_diag_table_files() to create a diagnostics table file within each cycle directory.

ush/set_FV3nml_sfc_climo_filenames.sh: (D)

ush/set_FV3nml_stoch_params.sh: (Z)
* For consistency with other scripts, rename the variable fv3_nml_ens_fp to fv3_nml_ensmem_fp.
* Use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) to set the full path to the current ensemble member's FV3 namelist file.

ush/setup.sh: (E), (F), (V), (Z)
* Call the new function set_cycle_dates() to set the new worklow array variable ALL_CDATES containing all the cycle days/times to be run.
* Set the new workflow variable NUM_CYCLES to the number of elements in ALL_CDATES.
* Set the new workflow array variable ENSMEM_NAMES containing the names of the ensemble members.
* Set the new workflow array variable FV3_NML_ENSMEM_FPS containing the full paths to the FV3 namelist files of the ensemble members.
* Remove creation of ensemble member directories.  These are now created in the j-jobs of the MAKE_ICS_TN or the MAKE_LBCS_TN task (whichever runs first).

ush/templates/FV3SAR_wflow.xml: (Y)
* Modify jinja code to allow for multiple cycles to be run.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Add CYCLE_DIR as an environment variable to the GET_EXTRN_ICS_TN and GET_EXTRN_LBCS_TN tasks.
* In the MAKE_ICS_TN, MAKE_LBCS_TN, RUN_FCST_TN, and RUN_POST_TN tasks, change the way CYCLE_DIR is set so that it does not include the ensemble member subdirectory.
* In the MAKE_ICS_TN, MAKE_LBCS_TN, RUN_FCST_TN, and RUN_POST_TN tasks, create the new environment variable SLASH_ENSMEM_SUBDIR that gets set to a null string if not running ensembles and to the string "/${name_of_ensemble_member}" when running ensembles.
…e develop branch into this fork (feature/ensemble).
…ensemble forecasts in community mode, change WE2E test "community_ensemble_008" to include 2 days and 2 cycle hours per day (instead of 1 and 1, respectively).
@gsketefian
Copy link
Copy Markdown
Collaborator Author

gsketefian commented Jul 10, 2020

This passes all WE2E tests on hera except regional_003, regional_004, and regional_010, which fail just as with the develop branch.

Please test on jet, cheyenne, and WCOSS_CRAY, and WCOSS_DELL_P3 as you see fit.

I still need to update the PR/commit message. Will do that while it is being tested.

@gsketefian gsketefian added the Needs Cheyenne test Testing needs to be run on NCAR Cheyenne machine label Jul 10, 2020
@BenjaminBlake-NOAA
Copy link
Copy Markdown
Collaborator

Ok! I just pushed my changes to your feature/ensemble branch. The workflow generation script was modified and I added run_fcst.local modulefiles. I will go ahead and approve this PR.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

Ok, I just updated the PR message.
@christinaholtNOAA Please take a look.
@BenjaminBlake-NOAA Just wondering which tests you ran on WCOSS*. Did you happen to run the one named "nco_ensemble"? That's the one that runs ensembles in NCO mode.

@BenjaminBlake-NOAA
Copy link
Copy Markdown
Collaborator

@gsketefian I didn't do anything differently than I normally do, which was just to run a normal end to end test of the workflow jobs on WCOSS to make sure nothing was broken. How do I run the nco_ensemble WE2E test? I can certainly do that. Do I need to specify that I want to run that test when generating the workflow?

@gsketefian gsketefian removed the Needs Jet test Testing needs to be run on Jet machine label Jul 20, 2020
@gsketefian
Copy link
Copy Markdown
Collaborator Author

@BenjaminBlake-NOAA Since you haven't run any WE2E tests yet, let's not worry about it for this PR since it's working for the cases you're interested in. But it would be nice to start running them, at least the ones relevant to NCO. If you like, we can set up a telecon to go over how to do that when you have time. I run them on hera all the time. There is some hard-coding in the test configurations for hera that we'd have to change. What approach to take to modify the tests for different machines is still up in the air, so we'll have to discuss that and see what works best for you.

Briefly, no, you don't specify the tests you want to run during the experiment generation step. There is a testing script that will generate a separate experiment for each test you want to run. That script is at tests/run_experiment.sh (probably should be renamed run_tests.sh). Each test has its own configuration file, and those are in tests/baseline_configs/config.*.sh, where the *s are the test names. You don't have to run all the tests, just a subset. We can start with running just the one named "nco_conus_c96", which is the most basic one in NCO mode. If you have time for a telecon, we can talk about the details then (but don't feel obliged if you're happy with the testing you're doing now).

@BenjaminBlake-NOAA
Copy link
Copy Markdown
Collaborator

@gsketefian Sure, if you have time to do a telecon to go over how to run the WE2E tests that'd be great. I'd like to know how to run them. My schedule is fairly open this week. @RatkoVasic-NOAA you'd be welcome to join as well.

@JulieSchramm
Copy link
Copy Markdown

JulieSchramm commented Jul 21, 2020 via email

@RatkoVasic-NOAA
Copy link
Copy Markdown
Collaborator

RatkoVasic-NOAA commented Jul 21, 2020 via email

@JeffBeck-NOAA JeffBeck-NOAA merged commit 293a3ec into ufs-community:develop Jul 22, 2020
mkavulich pushed a commit that referenced this pull request Jul 22, 2020
* Introduce the new workflow variables DO_ENSEMBLE and NUM_ENS_MEMBERS.  The user can enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE" and NUM_ENS_MEMBERS to the number of ensemble members to use.  Note that NUM_ENS_MEMBERS also specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08.  For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8, whereas if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08.
* During the experiment generation step, generate the full list of cycle dates/times to run and create a directory for each cycle.  Previously, the cycle directory for each cycle was created during the make_ics or make_lbcs task of that cycle (whichever ran first).
* When running ensemble forecasts, create a set of ensemble member directories under each cycle directory and use those ensemble directories as the FV3 run directories.  Note that these ensemble directories are created when the make_ics or make_lbcs task for that cycle and ensemble member runs (whichever of those two tasks happens to run first); they are not created during experiment generation, although that could be done as well.
* Modify the ush/generate_FV3SAR_wflow.sh script and the ush/templates/FV3SAR_wflow.xml jinja2 template to add the capability to have more than one cycle within an experiment.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Add three new WE2E tests for running ensemble forecasts:  two in community mode (community_ensemble_2mems and community_ensemble_008mems) and another in NCO mode (nco_ensemble).
  * The two community ensemble WE2E tests differ in that one of them uses NUM_ENS_MEMBERS="2" while the other uses NUM_ENS_MEMBERS="008".  The first is the simplest case of two ensemble members with member directorieds named mem1 and mem2, while the second is meant to test the ability to add leading zeros to the digits in the member directory names (should get mem001, mem002, ..., mem008) and to check that the workflow doesn't run into problems with integers being interpreted in octal because they start with leading 0's (as can happen in bash).
  * In order to test the (restored) capability of the workflow to run multiple cycles (possibly on different days), set up the community ensemble tests to have two cycle hours per day (instead of just one) and different starting and ending cycle days (ending is one day later than starting).  For the same reason, set up the nco ensemble test to have multiple cycle hours per day (but not different starting and ending days because only one day of data is available for this nco-mode test).

(A) To clarify that the contents of the subdirectory in which the external model files are placed are not the ICs or LBCs (on the native grid) but external model files that will be used to generate the ICs or LBCs, change the name of that subdirectory from "ICS" or "LBCS" to "for_ICS" or "for_LBCS", respectively.
(B) Insert the new environment variable SLASH_ENSMEM_SUBDIR anywhere CYCLE_DIR appears.  This variable is passed in by the rocoto XML.  If not running ensemble forecasts, it is simply set to an empty string; if running ensembles, it is set to the string "/${ensmem_subdir}" where ${ensmem_subdir} is the subdirectory of the current ensemble member under the current cycle directory.  This allows the subdirectories containing ICS, LBCS, and RESTART files to be placed directly under the current cycle directory when NOT running ensembles and for them to be placed under the current ensemble member directory (which is one level down from the current cycle directory) when running ensembles.
(C) For clarity, add new local variable run_dir that gets set to the run directory based on the current cycle and, if applicable, the ensemble member.  Note that if not running ensembles, run_dir is identical to the cycle directory (cycle_dir or CYCLE_DIR).
(D) To follow convention used in which the arguments to a function are in lowercase (because they are local variables within the function itself), change the argument CYCLE_DIR of exregional_run_fcst() function to cycle_dir.
(E) If the call to the function set_FV3nml_sfc_climo_filenames() fails, call print_err_msg_exit to print out an error message and exit.
(F) If running ensemble forecasts, call the new function set_FV3nml_stoch_params() that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member that contains a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of the experiment directory.
(G) Call the new function create_diag_table_files (in the new file ush/create_diag_table_files.sh) to create the diagnostics table files.
(H) Rename the variable FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN to clarify that it specifies the name of the FV3 namelist file for the base physics suite (which is used to generate the namelist file specific to the user-specified physic suite).  This is done to better distinguish this base namelist file from the base namelist file used to generate namelist files for the various ensemble members.  (The name of the latter is specified in the new workflow variable FV3_NML_BASE_ENS_FN.)
(I) Introduce the new workflow variable FV3_NML_BASE_ENS_FN that specifies the name to use for the base FV3 namelist file from which to generate the namelist file for each ensemble member.  This variable is not used if not running ensemble forecasts (i.e. if DO_ENSEMBLE is not set to "TRUE").
(J) Add the local variable dummy_run_dir that specifies the (dummy) directory with respect to which to set the relaive paths of the fixed files (i.e. those in the FIXam directory) in the FV3 namelist file.  When running ensembles, this path is two levels up from the run directory; without ensembles, it is only one level up (as was originally the case).
(K) Edit informational and/or error messages to the user.
(L) Remove commented-out code.
(M) Edit comments.

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (A)

jobs/JREGIONAL_MAKE_ICS: (B)

jobs/JREGIONAL_MAKE_LBCS: (B)

jobs/JREGIONAL_RUN_FCST: (B), (C), (D), (M)
* Pass in the environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR as arguments to the function exregional_run_fcst().  These are set by the rocoto XML.

jobs/JREGIONAL_RUN_POST: (B), (C) (M)
* In NCO mode, change location where ensemble directories are created to be under the cycle directory instead of above it.  Do this using the new environment variable SLASH_ENSMEM_SUBDIR.  This change is analogous to the change to community mode described in (B).
* In community mode, place the "postprd" subdirectory under the run directory instead of under CYCLE_DIR (since now, cycle directories are one level up if running ensembles and thus would be the incorrect place to create the "postprd" subdirectory).
* Change the argument cycle_dir of the function exregional_run_post to run_dir since that's what we really want in this function.  This is instead of passing in cycle_dir and then forming run_dir.
* Create the new argument cdate to the function exregional_run_post() and pass in the environment variable CDATE for its value (this is instead of using CDATE directly in exregional_run_post()).

scripts/exregional_make_grid.sh: (E), (F), (G), (M)

scripts/exregional_make_ics.sh: (A)

scripts/exregional_make_lbcs.sh: (A)

scripts/exregional_run_fcst.sh: (C), (D), (K), (M)
* Introduce new input arguments ensmem_indx and slash_ensmem_subdir that get set to the rocoto-specified environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR, respectively, in the call to this function in jobs/JREGIONAL_RUN_FCST.
* Change cycle_dir to run_dir in most places to make the directory name more general.  This is because the run directory will be the cycle directory only when not running ensembles.  When running ensembles, the run directory will be one of the ensemble member directories, which will be one level down from the current cycle directory.
* Fix typo where there is an extra "}" printed after $target in error messages.
* If running ensemble forecasts, use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) when creating a link in the run directory to the FV3 namelist file for the current ensemble member.  Note that these namelist files are cycle-independent and thus are created only once (during the experiment generation step).
* Move the creation of diagnostics table files to a new function (in ush/create_diag_table_files.sh), and call that function during experiment generation (in ush/generate_FV3SAR_wflow.sh) instead of here in exregional_run_fcst.sh.  We do this because the diagnostics table files depend only on the cycle, not the ensemble member.  Thus, since we know the cycles to run at experiment generation time, we generate the diagnostics file for each cycle at that time and place each in its corresponding cycle directory.
* If running ensembles, create symlinks in the run directory to the diagnostics table and model configure files in the cycle directory (which will be one level up from the run directory).  We don't do this when NOT running ensembles because in that case, the run directory is the cycle directory (and these two files already exist in that directory; they are created during experiment generation time).

scripts/exregional_run_post.sh: (C), (L), (M)
* Create the new input argument cdate [which gets set to the global variable CDATE in the call to this function, i.e. exregional_run_post(), in jobs/JREGIONAL_RUN_POST] and use it instead of the global variable CDATE.
* Change the argument cycle_dir to run_dir since that's more useful in this function.  Do this instead of passing in cycle_dir and then forming run_dir.
* Make the local variables "POST_..." lowercase to follow the convention that local variables be in lower case.

tests/baseline_configs/config.community_ensemble_008mems.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in community mode with NUM_ENS_MEMBERS set to "008".  This should result in eight workflow members with leading zeros in the digits in the names of the member directories, i.e. directories should be named mem001, mem002, ..., mem008.  To test the newly (re)added workflow capability to run multiple cycles, this test also uses first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) that are one day apart (instead of both being the same day as in most other WE2E tests) and two cycle hours per day.

tests/baseline_configs/config.community_ensemble_2mems.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in community mode with NUM_ENS_MEMBERS set to "2".  This should result in two workflow members without leading zeros in the digits in the names of the member directories, i.e. directories should be named mem1 and mem2.  To test the newly (re)added workflow capability to run multiple cycles, this test also uses first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) that are one day apart (instead of both being the same day as in most other WE2E tests) and two cycle hours per day.

tests/baseline_configs/config.nco_ensemble.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in NCO mode with NUM_ENS_MEMBERS set to "2".  This should result in two workflow members without leading zeros in the digits in the names of the member directories, i.e. directories should be named mem1 and mem2.  To test the newly (re)added workflow capability to run multiple cycles, this test also uses two cycle hours per day ("12" and "18").  However, unlike the ensemble tests in community mode, it uses the same first and last cycle dates (DATE_FIRST_CYCL and DATE_LAST_CYCL) because currently, the external model data for the next day (which would be 20190902) is not staged on hera.

tests/baselines_list.txt:
* Add the new WE2E tests for running ensemble forecasts described above.

ush/config_defaults.sh: (H), (I)
* Move the section for user-staged external model files parameters up to after the section that defines external model parameters.
* Move the section on stochastic parameters to after the new section that specifies whether or not to run ensemble forecasts (i.e. the variables DO_ENSEMBLE and NUM_ENS_MEMBERS; see below).
* Introduce the new workflow variable DO_ENSEMBLE that specifies whether or not to run ensemble forecasts.  Enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE".
* Introduce the new workflow variable NUM_ENS_MEMBERS that specifies the number of ensemble members.  This variable also specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08.  For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08.  This variable is not used if DO_ENSEMBLE is not set to "TRUE".

ush/create_diag_table_files.sh:
* New file that defines a function [create_diag_table_files()] that creates a diagnostics table file for each cycle date and places it in the corresponding cycle directory.

ush/create_model_config_files.sh:
* New file that defines a function [create_model_config_files()] that creates a model configuration file for each cycle date and places it in the corresponding cycle directory.

ush/generate_FV3SAR_wflow.sh: (H), (J) (L), (M)
* Add new ensemble-related parameters to the "settings" variable that is used to customize the jinja2 template for the rocoto XML file.  These new parameters allow the resulting XML to loop over ensemble members, to name rocoto tasks and log files such that they contain the ensemble member name (and are thus unique within an experiment), and to pass to the j-jobs the subdirectory of the current ensemble member under the current cycle directory.
* Change "settings" variable used to set parameters in the jinja template for the rocoto XML to add capability to have more than one cycle in the experiment.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Use the new workflow array variable ALL_CDATES (containing all the cycle dates/times to run) to create all the cycle directories.  Previously, the cycle directories were created during the make_ics or make_lbcs task.  Now, this must be done during the experiment generation step because now, the model configuration file(s) and possibly also the diagnostics table file(s) (if the MAKE_GRID_TN step is being skipped), which are cycle-dependent but ensemble-member-independent, are created and placed in the cycle directories during experiment generation.
* Call the new function create_model_config_files() to create a model configuration file within each cycle directory.
* If not running the MAKE_GRID_TN task, then do (E), (F), and (G).

ush/set_FV3nml_sfc_climo_filenames.sh: (J)

ush/set_FV3nml_stoch_params.sh:
* New file that defines a function [set_FV3nml_stoch_params()] that, for each ensemble member, takes the base FV3 namelist file and generates from it a new FV3 namelist file  containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of the experiment directory.

ush/set_cycle_dates.sh:
* New file that defines a function [set_cycle_dates()] that sets all the cycle dates/times to run in the experiment.

ush/setup.sh: (H), (I), (V), (M)
* Remove unneeded local variables YYYY_FIRST_CYCL, MM_FIRST_CYCL, DD_FIRST_CYCL, and HH_FIRST_CYCL.
* Call the new function set_cycle_dates() to set the new worklow array variable ALL_CDATES containing all the cycle days/times to be run as part of the experiment.
* Set the new workflow variable NUM_CYCLES (defined as the number of forecasts to run as part of the experiment) to the number of elements in ALL_CDATES.
* Rename FCST_LEN_HRS_MAX to fcst_len_hrs_max since it is a local variable and thus (by convention) should be lower case.
* Make sure that the new workflow variable DO_ENSEMBLE is set to a valid value.
* Set the new workflow variable NDIGITS_ENSMEM_NAMES that specifies the number of digits to use in the names of the ensemble member directories, e.g. whether to use mem1, mem2, ..., mem8 or mem01, mem02, ..., mem08.  Note that this is not a user-specifiable variable; it is obtained by counting the number of characters in NUM_ENS_MEMBERS.  For example, if NUM_ENS_MEMBERS is set to "8", then NDIGITS_ENSMEM_NAMES will get set to "1" and the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then NDIGITS_ENSMEM_NAMES will get set to "2" and the member directory names will be mem01, mem02, ..., mem08.
* Set the new workflow array variable ENSMEM_NAMES containing the names of the ensemble members.  These are used to set the ensemble member subdirectory names.
* Set the new workflow array variable FV3_NML_ENSMEM_FPS containing the full paths to the FV3 namelist files of the ensemble members.

ush/templates/FV3.input.yml:
* Remove setting of consv_te to 1.0 for the FV3_CPT_v0 suite because it generates an FV3 runtime error that states that this variable needs to be set to 0 for all regional runs.

ush/templates/FV3SAR_wflow.xml: (L)
* Modify jinja2 code to allow for multiple cycles to be run.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Bug fix - Change file name "make_grid_task_complete.txt" to "&MAKE_GRID_TN;_task_complete.txt" to make it changeable with the task name.
* Place a jinja2-controlled metatask around all tasks starting with MAKE_ICS_TN that loops over all ensemble members if do_ensemble is set to TRUE (note that do_ensemble gets set to the workflow variable DO_ENSEMBLE in ush/generate_FV3SAR_wflow.sh).  Make the print format of the loop index dependent on ndigits_ensmem_names (which in turn gets set to the workflow variable NDIGITS_ENSMEM_NAMES in ush/generate_FV3SAR_wflow.sh) to get the correct names for the ensemble member subdirectories (in terms of the number of leading zeros to use for the number portion of the subdirectory name).
* For tasks that are within the metatask that loops over the ensemble members, add a string (uscore_ensmem_name) that identifies the ensemble member to task names, corresponding job names, and log file names.  Note that this variable is set to an empty string if not running ensemble forecasts.
* For tasks that are within the metatask that loops over the ensemble members, create the new environment variable SLASH_ENSMEM_SUBDIR that gets set to the jinja2 variable slash_ensmem_subdir, which in turn gets set (in ush/generate_FV3SAR_wflow.sh) to a null string if not running ensembles and to the string "/${name_of_ensemble_member}" when running ensembles, where ${name_of_ensemble_member} is the name of the current ensemble member.
* In the RUN_FCST_TN task, create a new environment variable named ENSMEM_INDX and set it to the current value of the index used in the loop over ensemble members.  This is passed via the j-job jobs/JREGIONAL_RUN_FCST to the ex-script scripts/exregional_run_fcst.sh, where, if DO_ENSEMBLE is set to "TRUE", it is used to create the symlink in the run directory to the FV3 namelist file of the current ensemble member (which is in the top level of the experiment directory).  Note that this variable is set to an empty string if not running ensemble forecasts (in that case, it is not used in the ex-script).
* In the dependencies section of the RUN_POST_TN task, add slash_ensmem_subdir to the paths of the dynf*.nc and phyf*.nc files (since when running ensemble forecasts, these files will be in the ensemble member directories under the cycle directories).

ush/valid_param_vals.sh:
* Specify valid values for the new workflow variable DO_ENSEMBLE.

Co-authored-by: Benjamin.Blake EMC <Benjamin.Blake@m71a3.ncep.noaa.gov>
@christinaholtNOAA
Copy link
Copy Markdown
Contributor

I am working to merge these changes into PR #253. It seems that model_configure is now written by generate_FV3SAR_wflow.sh, but it has cycle-specific values in it. Can anyone speak to this discrepancy?

@gsketefian @BenjaminBlake-NOAA @JeffBeck-NOAA

@BenjaminBlake-NOAA
Copy link
Copy Markdown
Collaborator

@christinaholtNOAA I'm not sure, sorry. The only changes I made to generate_FV3SAR_wflow.sh for this PR were specific to WCOSS and not related to the model_configure file.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

@christinaholtNOAA Since all ensemble members under a given cycle share the same model_configure and diag_table files (because these files are cycle-specific but not ensemble-member-specific), to avoid duplication I wanted there to be only one pair of these files for each cycle and had the ensemble member directories for that cycle just contain symlinks to these two files. The most natural place to put these files was at the top level of the cycle directory.

Since scripts/exregional_run_fcst.sh is called for every combination of cycle and ensemble member but we don't want to create a new model_configure (or diag_table) for each such combination, I would have had to put in a check in that script to see if the model_configure for that cycle has already been created for that cycle (by an ensemble member other than the current one) and if so, not create a new one. Instead of doing that, it seemed simpler to just create model_configure for each cycle during workflow generation.

@christinaholtNOAA
Copy link
Copy Markdown
Contributor

Creating a single one up front will not work when running in real-time, as you lay it out here. Is this process prohibitively expensive to handle by each ensemble member script independently? Adding dependencies on other jobs is going to get very complicated if this is not cost-prohibitive.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

@christinaholtNOAA It would not be cost-prohibitive to create a model_configure file in run_fcst for one of the ensemble members (but not the others in that cycle). It just seems cleaner to set up everything that you know you're going to need (like a model_configure for each member) during workflow generation instead of when the run_fcst task starts, where you'd have to see if another ensemble member already started running run_fcst and thus created model_configure for that cycle.

Adding dependencies on other jobs is going to get very complicated if this is not cost-prohibitive.
I do not follow. Can you elaborate? Maybe an example?

@gsketefian
Copy link
Copy Markdown
Collaborator Author

Correction: I meant a model_configure file for each cycle, not for each member.

@christinaholtNOAA
Copy link
Copy Markdown
Contributor

The dependency I am referring to here is that you are requiring another job (maybe ensemble member 1) to create a task- and cycle-dependent input for all other forecast jobs with knowledge about all of those cycles to be run before you ever get started instead of creating it at forecast run time, per-forecast. Retrospective cold starts and ensembles are not the only forecasts you are setting this precedent for. For other CAM workflows, this script could be running multiple times each cycle for a deterministic short or long forecast, a set of ensemble members, different resources, and an undefined set of cycles (which is the case in real time).

The generation of cycle-dependent and task-dependent (used differently in multiple tasks for the same workflow) CANNOT be brought to the workflow generation layer and MUST be left as a job for the script running the task.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

@christinaholtNOAA Thanks for the clarification. Definitely if there's a use case in which each cycle is going to be run multiple times with different settings of the parameters in model_configure, then model_configure needs to be created in exregional_run_fcst.sh and placed in the run directory. I'll work on a fix today and issue a PR.

@christinaholtNOAA
Copy link
Copy Markdown
Contributor

Does it make sense to revert this PR merge? This is a hard block on resolving conflicts for PR #253.

@gsketefian
Copy link
Copy Markdown
Collaborator Author

Let me try getting a new PR in in the next couple of hours. If I don't get it done pretty soon, we'll revert.

@gsketefian gsketefian deleted the feature/ensemble branch August 21, 2020 18:59
JeffBeck-NOAA pushed a commit that referenced this pull request Sep 8, 2020
* adding files for getting nomads data
new files:
         ush/NOMADS_get_extrn_mdl_files_grib.sh
         ush/NOMADS_get_extrn_mdl_files_nemsio.sh

* updated code for getting data online in one file
new file:
     ush/NOMADS_get_extrn_mdl_files.sh
deleted files:
     ush/NOMADS_get_extrn_mdl_files_grib.sh
     ush/NOMADS_get_extrn_mdl_files_nemsio.sh

* Enable ensemble forecasts (#245)

* Modify workflow to enable ensemble forecasts.

Summary of modifications:
------------------------
* Introduce the new workflow variables DO_ENSEMBLE and NUM_ENS_MEMBERS.  The user can enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE" and NUM_ENS_MEMBERS to the number of ensemble members to use.
* When running ensemble forecasts, create/insert a set of ensemble member directories and create the cycle directories under these member directories.  These ensemble member directories are placed at the directory level that cycle directory levels would be placed when not running ensemble forecasts.
* Regardless of whether or not ensembles are enabled, change location where external model files are staged so that they are not in the cycle directories but instead one (without ensembles) or two (with ensembles) directory levels up.  In the case with ensembles, this needs to be done so that the external model files are not duplicated within each ensemble member directory; they do not need to be because all ensemble members use the same external model files.  This is also done for the case without ensembles in order to minimize the difference in worklfow behavior between the with and without ensemble cases.  To make this change of location of external model files, the new workflow variable EXTRN_MDL_FILES_BASEDIR is introduced (it is not a user-specified variable but a secondary one).
* Add two new WE2E tests for running ensemble forecasts, one in community mode (community_ensemble) and another in NCO mode (nco_ensemble).

Modifications common to more than one file (used below in listing of file-by-file modifications):
------------------------------------------------------------------------------------------------
(A) Fix/add/delete comments and/or informational and/or error messages.
(B) Remove commented out code.
(C) Change location where external model files are staged so that they are not in the cycle directories (which are now underneath each ensemble member directory) but instead one level up.  This needs to be done so that the external model files are not duplicated within each ensemble member directory; they do not need to be because all ensemble members use the same external model files.
(D) Add a call to the new function set_FV3nml_stoch_params() that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of that ensemble member's directory (so that all cycles in that member directory can create symlinks to it).
(E) Rename the variable FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN to clarify that it specifies the name of the FV3 namelist file for the base physics suite (which is used to generate the namelist file specific to the user-specified physic suite).  This is done to better distinguish this base namelist file from the base namelist file used to generate namelist files for the various ensemble members.  (The name of the latter is specified in the new workflow variable FV3_NML_BASE_ENS_FN.)
(F) Introduce the new workflow variable FV3_NML_BASE_ENS_FN that specifies the name to use for the base FV3 namelist file from which to generate the namelist file for each ensemble member.  This variable is not used if not running ensemble forecasts (i.e. if DO_ENSEMBLE is not set to "TRUE").
(G) Add the local variable dummy_cyc_dir that specifies the (dummy) directory with respect to which to set the relaive paths of the fixed files (i.e. those in the FIXam directory) in the FV3 namelist file.  When running ensembles, this path is two levels up from the cycle directory; without ensembles, it is only one level up (as was originally the case).

File-by-file description of modifications:
-----------------------------------------

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (C)

jobs/JREGIONAL_RUN_FCST:
* Change CYCLE_DIR to cycle_dir since it is a local variable in this context (it is an argument to the script exregional_run_fcst.sh).

jobs/JREGIONAL_RUN_POST:
* For NCO mode, change directory in which output from the RUN_POST_TN task is stored such that if running ensemble forecasts, subdirectories are created under COMOUT_BASEDIR for each ensemble member.  This is done via the variable SLASH_ENSMEM_DIR, which is set to either "/mem$NN" where $NN is the member number (if running ensemble forecasts) or to a null string (if not running ensemble forecasts).  For community mode, the output from the post task is under CYCLE_DIR, which now gets set in the rocoto XML such that it is under an ensemble member directory (see below description of modifications to ush/templates/FV3SAR_wflow.xml).

modulefiles/tasks/hera/make_ics.local:
* Add wgrib2 (must have been removed by mistake?).

modulefiles/tasks/hera/make_lbcs.local:
* Add wgrib2 (must have been removed by mistake?).

scripts/exregional_make_grid.sh: (A), (D)

scripts/exregional_make_ics.sh: (C)

scripts/exregional_make_lbcs.sh: (C)

scripts/exregional_run_fcst.sh:
* Change CYCLE_DIR to cycle_dir since it is a local variable.
* Add a check such that if running ensemble forecasts, the symlink for the FV3 namelist file that must be present in the cycle directory points to the namelist file at the top level of the ensemble directory under which that cycle directory is located.

tests/baseline_configs/config.community_ensemble.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in community mode.

tests/baseline_configs/config.nco_ensemble.sh:
* New workflow configuration file to perform WE2E test of ensemble forecasts in NCO mode.

tests/baselines_list.txt:
* Add two new WE2E tests for running ensemble forecasts, one in community mode (community_ensemble) and another in NCO mode (nco_ensemble).

ush/config_defaults.sh: (A), (E), (F)
* Introduce the new workflow variable DO_ENSEMBLE that specifies whether or not to run ensemble forecasts.  Enable ensemble forecasts by setting DO_ENSEMBLE to "TRUE".
* Introduce the new workflow variable NUM_ENS_MEMBERS that specifies the number of ensemble members.  This variable is not used if DO_ENSEMBLE is not set to "TRUE".

ush/generate_FV3SAR_wflow.sh: (A), (B), (D), (E), (G)
* Add new ensemble-related parameters to the "settings" variable that is used to customize the jinja2 template for the rocoto XML file.  These new parameters allow the resulting XML to loop over ensemble members, rename rocoto tasks and log files such that they contain the member number (and are thus unique), and modify cycle directories so that they are member-specific.

ush/set_FV3nml_sfc_climo_filenames.sh: (G)

ush/set_FV3nml_stoch_params.sh
* File to define new function that takes a base FV3 namelist file and generates from it a new FV3 namelist file for each ensemble member containing a unique set of stochastic parameters (relative to other ensemble members) and places it at the top level of that ensemble member's directory (so that all cycles in that member directory can create symlinks to it).

ush/setup.sh: (E), (F)
* Rename FCST_LEN_HRS_MAX to fcst_len_hrs_max since it is a local variable.
* Introduce the new workflow variable EXTRN_MDL_FILES_BASEDIR that specifies the base directory under which the external model files will be staged.  Under this directory, a subdirectory will be created for each external model (one for ICs, another for LBCs if different from the one for ICs), and under these, subdirectories will be created for each cycle in which to stage the files.  Note that EXTRN_MDL_FILES_BASEDIR is a secondary variable in the sense that it is not user-specifiable.
* Rename FV3_NML_BASE_FP to FV3_NML_BASE_SUITE_FP for the same reason as renaming of FV3_NML_BASE_FN to FV3_NML_BASE_SUITE_FN (see (E) above).
* Create new workflow variable FV3_NML_BASE_ENS_FP that specifies the full path to the base FV3 namelist file from which the namelist files for the individual ensemble members are generated.
* Introduce the new workflow array variable ENS_MEMBER_DIRS.  If running ensemble forecasts, set its elements to the ensemble member directories immediately under the experiment directory.
* If running ensemble forecasts, create the ensemble directories specified in the new workflow array variable ENS_MEMBER_DIRS.
* Record new variables to the workflow variable definitions file.

ush/templates/FV3SAR_wflow.xml:
* Bug fix - Change file name "make_grid_task_complete.txt" to "&MAKE_GRID_TN;_task_complete.txt" to make it changeable with the task name.
* Remove CYCLE_BASEDIR as an environment variable from the GET_EXTRN_ICS_TN and GET_EXTRN_LBCS_TN tasks.  This variable is no longer needed because the external model files are now staged outside of the cycle directories (under EXTRN_MDL_FILES_BASEDIR).
* Place a jinja2-controlled metatask around all tasks starting with MAKE_ICS_TN that loops over all ensemble members if do_ensemble is set to TRUE.
* For tasks that are within the metatask that loops over the ensemble members, add a string (uscore_ensmem_name) that identifies the ensemble member to task names, corresponding job names, and log file names.  Note that this variable is set to an empty string if not running ensemble forecasts.
* For tasks that are within the metatask that loops over the ensemble members, add a string (slash_ensmem_dir) that inserts the ensemble member directory to the definition of CYCLE_DIR (since when running ensembles, the cycle directories are under the member directories).  Note that this variable is set to an empty string if not running ensemble forecasts.
* Set the ensemble index (ENSMEM_INDX) as an environment variable in the RUN_FCST_TN task.  This is needed in the ex-script exregional_run_fcst.sh to be able to the symlink in the cycle directory to the FV3 namelist file in the correct ensemble member directory.  Note that this variable is set to an empty string if not running ensemble forecasts (in that case, it is not used).
* Set the ensemble member subdirectory preceded by a slash (SLASH_ENSMEM_DIR) as an environment variable in the RUN_POST_TN task.  This is needed in NCO mode when setting the directory in which to place the output of UPP.  Note that this variable is set to an empty string if not running ensemble forecasts.

ush/valid_param_vals.sh:
* Specify valid values for the new workflow variable DO_ENSEMBLE.

* Minor changes to code comments.

* Apparently there is now a requirement in the FV3 code that consv_te be set to 0 on any regional grid.  Make this change for the FV3_CPT_v0 suite (which is the only one for which consv_te had been set to a nonzero value).

* Bug fix in a diag_table that should already be in the develop branch.

* Bug fix -- Fix inconsistency in the way the ensemble member directories are named in different scripts.

The workflow generation scripts create ensemble directories named, e.g., mem1, mem2, ..., mem8, but the exregional_run_fcst.sh script assumes they are mem01, mem02, ..., mem08.  Make these consistent.  Now, the naming convention used depends on whether or not leading zeros are included in NUM_ENS_MEMBERS.  For example, if NUM_ENS_MEMBERS is set to "8", then the member directory names will be mem1, mem2, ..., mem8; and if NUM_ENS_MEMBERS is set to "08", then the member directory names will be mem01, mem02, ..., mem08.

* Add new WE2E test to test use of leading zeros in ensemble member names.

* Change directory structure so that ensemble member directories are beneath the cycle directories (instead of the opposite).  Details below.

Summary of modifications:
------------------------
* Place cycle directories above ensemble member directories, i.e. each cycle directory will contain a full set of ensemble member subdirectories that are used as the run directories.  Previously, it was the other way around, i.e. each member directory contained all cycle subdirectories.
* Move the external model directories into each cycle directory (instead of being in their own directory called extrn_mdl_files under the main experiment directory).
* During the experiment generation step, generate the full list of cycle dates/times to run and create a directory for each cycle.
* Add capability to have more than one cycle.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* In order to test the capability of the workflow to run multiple cycles (possibly on different days), modify the WE2E tests community_ensemble_2mems and nco_ensemble so that there are two cycle hours per day.  Also, modify community_ensemble_2mems so that the starting and ending days are different (one day later).

Modifications common to more than one file (used below in listing of file-by-file modifications):
------------------------------------------------------------------------------------------------
(A) Change the location where external model files are stored to be under each cycle directory (instead of a separate directory specified by EXTRN_MDL_FILES_BASEDIR under the main experiment directory).
(B) Insert the new environment variable SLASH_ENSMEM_SUBDIR anywhere CYCLE_DIR appears.  This variable is passed in by the rocoto XML.  If not running ensemble forecasts, it is simply set to an empty string, and if running ensembles, it is set to the string "/${ensmem_subdir}" where ensmem_subdir is the subdirectory of the current ensemble member under the current cycle directory.  This allows the subdirectories containing ICS, LBCS, and RESTART files to be placed directly under the current cycle directory when NOT running ensembles and for them to be placed under the current ensemble member directory (which is one level down from the current cycle directory) when running ensembles.
(C) For clarity, add new local variable run_dir that gets set to the run directory based on the current cycle and, if applicable, the ensemble member.
(D) Call the new function create_diag_table_files (in the new file create_diag_table_files.sh) to create diagnostics table files.
(D) For correctness, rename the local variable dummy_cyc_dir to dummy_run_dir.
(E) Remove any use of EXTRN_MDL_FILES_BASEDIR since it is no longer needed as a workflow variable.
(F) Remove any use of ENS_MEMBER_DIRS since it is no longer needed as a workflow variable.
(V) Remove unused code.
(W) Edit informational and/or error messages.
(X) Remove trailing whitespace.
(Y) Remove commented out code.
(Z) Edit comments.

File-by-file description of modifications:
-----------------------------------------

jobs/JREGIONAL_GET_EXTRN_MDL_FILES: (A)

jobs/JREGIONAL_MAKE_ICS: (B)

jobs/JREGIONAL_MAKE_LBCS: (B)

jobs/JREGIONAL_RUN_FCST: (B), (C), (Z)
* Pass in ENSMEM_INDX and SLASH_ENSMEM_SUBDIR as arguments to the function exregional_run_fcst().

jobs/JREGIONAL_RUN_POST: (B), (Z)
* Create the new local variable run_dir in which to store the path to the run directory (for the current cycle and possibly ensemble member).
* In NCO mode, change location where ensemble directories are created to be under the cycle directory instead of above it (analogous change to NCO mode as is done in (B) for community mode).
* In community mode, place the postprd subdirectory under the run directory instead of under CYCLE_DIR (since now, cycle directories are one level up if running ensembles and thus would be the incorrect place to create postprd).
* Create the new argument cdate to the function exregional_run_post() and pass in the environment variable CDATE for its value (this is instead of using CDATE directly in exregional_run_post()).
* Change the argument cycle_dir of the function exregional_run_post to run_dir since that's what we really want in that function.  This is instead of passing in cycle_dir and then forming run_dir.

scripts/exregional_make_grid.sh: (D)

scripts/exregional_make_ics.sh: (A), (X)

scripts/exregional_make_lbcs.sh: (A), (X)

scripts/exregional_run_fcst.sh: (C), (W)
* Introduce new input arguments ensmem_indx and slash_ensmem_subdir that get set to the rocoto-specified environment variables ENSMEM_INDX and SLASH_ENSMEM_SUBDIR, respectively, in the call to this function in jobs/JREGIONAL_RUN_FCST.
* Change cycle_dir to run_dir in most places to make the directory name more general.  This is because the run directory will be the cycle directory only when not running ensembles.  When running ensembles, the run directory will be one of the ensemble member directories, which will be one level down from the current cycle directory.
* Fix typo where there is an extra "}" printed after $target in error messages.
* Use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) when creating a link in the run directory to the FV3 namelist file for the current ensemble member.  Note that these namelist files are cycle-independent and thus are created only once during the experiment generation step.
* Move the creation of diagnostics table files to a new function (in ush/create_diag_table_files.sh), and call that function during experiment generation (in ush/generate_FV3SAR_wflow.sh) instead of here in exregional_run_fcst.sh.  We do this because the diagnostics table files depend only on the cycle, not the ensemble member.  Thus, since we know the cycles to run at experiment generation time, we generate the diagnostics file for each cycle then and place each in its corresponding cycle directory.
* If running ensembles, create symlinks in the run directory to the diagnostics table and model configure files in the cycle directory (which will be one level up from the run directory).  We don't do this when NOT running ensembles because in that case, the run directory is the cycle directory (and these two files already exist in that directory; they are created during experiment generation time).

scripts/exregional_run_post.sh: (C), (Y)
* Create the new input argument cdate (which gets set to the global variable CDATE in the call to this function (exregional_run_post)) and use it instead of the global variable CDATE.
* Change the argument cycle_dir to run_dir since that's more useful in this function.  This is instead of passing in cycle_dir and then forming run_dir.
* Make the local variables "POST_..." lowercase to follow the convention that local variables be in lower case.

tests/baseline_configs/config.community_ensemble_2mems.sh:
* Modify settings in this test configuration so that the starting and ending days of the cycles are not the same and so that there are two cycle hours per day.  This is to have more thorough testing of the ensembles feature in community mode.

tests/baseline_configs/config.nco_ensemble.sh:
* Modify settings in this test configuration so that there are two cycle hours per day.  This is to have more thorough testing of the ensembles feature in NCO mode.

ush/create_diag_table_files.sh:
* New file containing a function that creates a diagnostics table file for each cycle date and places it in the corresponding cycle directory.

ush/create_model_config_files.sh:
* New file containing a function that creates a model configuration file for each cycle date and places it in the corresponding cycle directory.

ush/set_cycle_dates.sh:
* New function that sets all the cycle dates/times to run.

ush/generate_FV3SAR_wflow.sh: (D)
* For clarity and consistency with other scripts, change variable name from slash_ensmem_dir to slash_ensmem_subdir.
* Change "settings" variable used to set parameters in the jinja template for the rocoto XML to add capability to have more than one cycle.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Use the new workflow array variable ALL_CDATES (containing all the cycle dates/times to run) to create all the cycle directories.  Previously, the cycle directories were created during the make_ics or make_lbcs task, but it is clearer to do it during experiment generation.  Also, it now must be done during experiment generation because now, the model configuration file(s) and possibly also the diagnostics table file(s) (if the MAKE_GRID_TN step is being skipped), which are cycle-dependent but ensemble-member-independent, are created and placed in the cycle directories during experiment generation.
* Call the new function create_model_config_files() to create a model configuration file within each cycle directory.
* If not running the MAKE_GRID_TN task, call the new function create_diag_table_files() to create a diagnostics table file within each cycle directory.

ush/set_FV3nml_sfc_climo_filenames.sh: (D)

ush/set_FV3nml_stoch_params.sh: (Z)
* For consistency with other scripts, rename the variable fv3_nml_ens_fp to fv3_nml_ensmem_fp.
* Use the new workflow array variable FV3_NML_ENSMEM_FPS (which contains the full paths to the FV3 namelist files for each ensemble member) to set the full path to the current ensemble member's FV3 namelist file.

ush/setup.sh: (E), (F), (V), (Z)
* Call the new function set_cycle_dates() to set the new worklow array variable ALL_CDATES containing all the cycle days/times to be run.
* Set the new workflow variable NUM_CYCLES to the number of elements in ALL_CDATES.
* Set the new workflow array variable ENSMEM_NAMES containing the names of the ensemble members.
* Set the new workflow array variable FV3_NML_ENSMEM_FPS containing the full paths to the FV3 namelist files of the ensemble members.
* Remove creation of ensemble member directories.  These are now created in the j-jobs of the MAKE_ICS_TN or the MAKE_LBCS_TN task (whichever runs first).

ush/templates/FV3SAR_wflow.xml: (Y)
* Modify jinja code to allow for multiple cycles to be run.  This capability was previously present but was inadvertently disabled during transition to generating the rocoto XML using a jinja2 template.
* Add CYCLE_DIR as an environment variable to the GET_EXTRN_ICS_TN and GET_EXTRN_LBCS_TN tasks.
* In the MAKE_ICS_TN, MAKE_LBCS_TN, RUN_FCST_TN, and RUN_POST_TN tasks, change the way CYCLE_DIR is set so that it does not include the ensemble member subdirectory.
* In the MAKE_ICS_TN, MAKE_LBCS_TN, RUN_FCST_TN, and RUN_POST_TN tasks, create the new environment variable SLASH_ENSMEM_SUBDIR that gets set to a null string if not running ensembles and to the string "/${name_of_ensemble_member}" when running ensembles.

* Bug fixes.  These bugs were introduced during the previous merge of the develop branch into this fork (feature/ensemble).

* To test the multiple-days and multiple-cycle-hours capabilities with ensemble forecasts in community mode, change WE2E test "community_ensemble_008" to include 2 days and 2 cycle hours per day (instead of 1 and 1, respectively).

* Add WCOSS changes for the feature/ensemble branch

* Minor changes to code comments.

Co-authored-by: Benjamin.Blake EMC <Benjamin.Blake@m71a3.ncep.noaa.gov>

* updated codes according to the comments
modified files:
       scripts/exregional_get_extrn_mdl_files.sh
       ush/NOMADS_get_extrn_mdl_files.sh
       ush/config_defaults.sh
       ush/generate_FV3SAR_wflow.sh
       ush/valid_param_vals.sh

* Change filename:
 ush/generate_FV3SAR_wflow.sh -> ush/generate_FV3LAM_wflow.sh

* adding one test to WE2W for downloading files
new file for the test:
        tests/baseline_configs/config.user_download_extrn_files.sh
modified file for the test:
        tests/baselines_list.txt
modified files for recent changes(SAR to LAM, JPgrid to ESGgrid)
        scripts/exregional_get_extrn_mdl_files.sh
        ush/config_defaults.sh
        ush/valid_param_vals.sh

Co-authored-by: Linlin.Pan <Linlin.Pan@noaa.gov>
Co-authored-by: gsketefian <31046882+gsketefian@users.noreply.github.com>
Co-authored-by: Benjamin.Blake EMC <Benjamin.Blake@m71a3.ncep.noaa.gov>
christinaholtNOAA pushed a commit to christinaholtNOAA/regional_workflow that referenced this pull request Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants