Skip to content

SRW App Users' Guide (close-to-final): corrections and notes (not meant to be merged, but to be picked up by the documentation team)#126

Closed
climbfuji wants to merge 1 commit into
ufs-community:release/public-v1from
climbfuji:users_guide_final_corrections_notes
Closed

SRW App Users' Guide (close-to-final): corrections and notes (not meant to be merged, but to be picked up by the documentation team)#126
climbfuji wants to merge 1 commit into
ufs-community:release/public-v1from
climbfuji:users_guide_final_corrections_notes

Conversation

@climbfuji
Copy link
Copy Markdown
Collaborator

@climbfuji climbfuji commented Feb 27, 2021

This PR contains corrections (that should be reviewed but can be accepted as-is, and many notes (starting with NOTE DOM 20220227) in the users' guide rst files.

My suggest is for the documentation team to check out this PR (up to date with release/public-v1), review my corrections and accept/modify/reject, then look at all NOTE DOM 20220227 notes in the different *.rst files and address/fix or decide to ignore, but in any case remove my comment. The resulting version of the documentation can then be used as the final documentation update PR for the app.

First and foremost, overall the documentation looks great and I could see the tremendous amount of work that went into it. However, there are a few issues that I think need to be addressed.

There are several broken references/links, one table doesn't show in the html, some paths are wrong, sometimes platforms are forgotten or have wrong information on where data is located etc.

Chapters w/o corrections doesn't mean I didn't read them, I just didn't find anything wrong (cudos to the authors).

The weakest chapter with most mistakes is chapter 10, Graphics.rst. Many of the instructions are wrong, because the modules that should be loaded do not exist (not even the top-level directory /app on several systems). I left specific notes for gaea, too. It seems to me that nobody actually tried if these instructions worked. Whoever revisits this chapter please make sure to follow all the steps on all platforms, if necessary by working with someone who has access to a specific platform. I can help with gaea, for instance. It might be better to ditch the job submission scripts in this chapter 10 and in directory ush/Python/ altogether and just tell people to run on the login node or to launch an interactive job (refer users to the HPC documentation pages) and run it there. Unless someone has time to test that the job submission scripts work on all platforms as intended.

@climbfuji climbfuji force-pushed the users_guide_final_corrections_notes branch 3 times, most recently from 91bf763 to 2328cb1 Compare February 28, 2021 03:14
@climbfuji climbfuji changed the title WORK IN PROGREsS SRW App Users' Guide (close-to-final): corrections and notes SRW App Users' Guide (close-to-final): corrections and notes (not meant to be merged, but to be picked up by the documentation team) Feb 28, 2021
@climbfuji climbfuji force-pushed the users_guide_final_corrections_notes branch from 2328cb1 to 5ed71b8 Compare February 28, 2021 03:26
@climbfuji climbfuji marked this pull request as ready for review February 28, 2021 03:31
@jwolff-ncar
Copy link
Copy Markdown
Collaborator

@climbfuji Thanks for your detailed comb through of the documentation. I am working on addressing the comments. We will need to meet as a team to talk through some of them (I will try to get this scheduled tomorrow afternoon) and then you will see a new PR.

NOTE DOM 20220227 - This /apps directory doesn't exist on jet (haven't checked hera)

.. code-block:: console

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /apps directory is only for Orion (Jet and Hera use /contrib). The Gaea block can be changed to:

module use /lustre/f2/pdata/esrl/gsd/contrib/modulefiles
module load rocoto/1.3.3
module load miniconda3/4.8.3-regional-workflow

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this Gaea information one more time. This is what is set for the workflow, but the python requirements for the plotting scripts are a bit different. For example, not sure why we would need module load rocoto/1.3.3 here.

NOTE DOM 20220227 - why is this not in a shared, user-independent directory? We have /work/noaa/nems/emc.nemspara and /work/noaa/nems/glopara, the latter already hosts the fixed files etc. A conditionn for tier-1 platforms was that no data/scripts/... resides in user directories. This disqualified stampede from being a tier-1 platform.

.. code-block:: console

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move NaturalEarth on Orion to:

/work/noaa/global/glopara/fix/

I can send an email to Kate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved the NaturalEarth files on Hera to a user-independent directory. Line 47 can be changed to:

/scratch2/BMC/det/UFS_SRW_app/v1p0/fix_files/NaturalEarth

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated to documentation to both of these paths so we need to make sure they are moved appropriately (talk to Kate about Orion).


NOTE DOM 20220227 - this script will *not* work on gaea for sure. First of all because the module load commands are wrong (see above, second because gaea uses #SBATCH --clusters, third because you cannot do module purge and module load hpss). The line ". /apps/lmod/lmod/init/sh" doesn't work on gaea, it doesn't work on jet either.

.. code-block:: console
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the major differences between Gaea and Jet/Hera/Orion, and the current time crunch, I would suggest we list the sbatch script as not currently functional on Gaea in the known limitations document.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording to the following: (Thoughts? Comments?)
If the Python scripts are being used to create plots of multiple forecast lead times and forecast
variables, then they should be submitted to the batch system using either something like the sq_job.sh
or sq_job_diff.sh script (for a platform such as Hera that uses the slurm
job scheduler) or the qsub_job.sh or qsub_job_diff.sh script (for a platform such as
Cheyenne that uses PBS or PBS Pro as the job scheduler). Examples of these scripts are located under
ufs-srweather-app/regional_workflow/ush/Python and can be used as a starting point to create a batch script
for your platform/job scheduler of use. The scripts must be submitted using the command appropriate
for the job scheduler used on your platform. For example, on Hera,
sq_job.sh can be submitted as follows:

.. code-block:: console

sbatch sq_job.sh

On Cheyenne, qsub_job.sh can be submitted as follows:

.. code-block:: console

qsub qsub_job.sh


NOTE DOM 20220227 on jet at least, rocotorun is not in the path, not even after running "source env/build_jet_intel.env" and "source ../../env/wflow_jet.env"; someone needs to check that on all platforms and insert a code block here to load the rocoto module.

.. code-block:: console
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build and wflow environment files are not supposed to load the Rocoto module. That is done on a platform-by-platform basis by the launch_FV3LAM_wflow.sh script:

if [ "$MACHINE" = "CHEYENNE" ]; then
module use -a /glade/p/ral/jntp/UFS_SRW_app/modules/
module load rocoto
elif [ "$MACHINE" = "ORION" ]; then
module purge
module load contrib rocoto
elif [ "$MACHINE" = "WCOSS_DELL_P3" ]; then
module purge
module load lsf/10.1
module use /gpfs/dell3/usrx/local/dev/emc_rocoto/modulefiles/
module load ruby/2.5.1 rocoto/1.2.4
elif [ "$MACHINE" = "WCOSS_CRAY" ]; then
module purge
module load xt-lsfhpc/9.1.3
module use -a /usrx/local/emc_rocoto/modulefiles
module load rocoto/1.2.4
elif [ "$MACHINE" = "GAEA" ]; then
module use /lustre/f2/pdata/esrl/gsd/contrib/modulefiles
module load rocoto/1.3.3
else
module purge
module load rocoto
fi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, based on this we have two options. I can add in these module loads for each platform or we can remove the */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && /glade/p/ral/jntp/tools/rocoto/rocoto-1.3.1/bin/rocotorun -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10 (which happens to be what I use all the time!) and just leave */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && ./launch_FV3LAM_wflow.sh

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, have you considered explicitly listing Hera and Jet in that list rather than the just leaving it in the else block? Not a big deal for this release, but may be something to consider for the future.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwolff-ncar, you're correct. Hera and Jet should be explicitly listed. We can add that to our list of post-release changes.

Can we just add a caveat to the rocotorun/rocotoboot commands listed and say (the Rocoto module must be in your path)? Again, these commands (and the crontab example) are only necessary if the user does not use the launch script or the crontab option in config.sh.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in the module loads for the different platforms for now. If there is disagreement we can discuss tomorrow.


NOTE DOM 20220227 rocotorun will not be in the path unless the user sets PATH=... in the crontab

.. code-block:: console
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the launch script is so the user doesn't need to worry about whether rocotorun is in their path when running from the command line.  In addition, if the user sets USE_CRON_TO_RELAUNCH in the config.sh, they don't need to worry about setting anything in the cron, since the cron points to the launch script (which loads the Rocoto module).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion here, I will just list */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && ./launch_FV3LAM_wflow.sh as the only option for cron here. That should avoid this problem.


NOTE DOM 20220227 would it be good to mention the number of horizontal grid points NX x NY in this table/section? Or any other information (min/max lat/lon etc)? or a plot of the three domains? Or provide a link to section 6.1?

Case-specific Configuration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the user can create their own grid, I don't think there's a reason to point them to the pre-defined grids here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is just meant to provide a general overview of the utilities.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a pointer to chpt 6. That hopefully suffices.


NOTE DOM 20220227 need to double check, beforehamnd expt_dirs sits in ufs-srweather-app/../expt_dirs, i.e. homerrfs would be ufs-srweather-app

``EXPT_SUBDIR``: (Default: “”)
Copy link
Copy Markdown
Collaborator

@JeffBeck-NOAA JeffBeck-NOAA Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${HOMErrfs}=ufs-srweather-app/regional_workflow, therefore we need to change ${HOMErrfs}/../expt_dirs to ${HOMErrfs}/../../expt_dirs.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@mkavulich
Copy link
Copy Markdown
Collaborator

Comments addressed in #127

@mkavulich mkavulich closed this Mar 2, 2021
christopherwharrop-noaa pushed a commit to christopherwharrop-noaa/ufs-srweather-app that referenced this pull request Apr 5, 2022
natalie-perlin pushed a commit to natalie-perlin/ufs-srweather-app that referenced this pull request Jun 2, 2024
…te regional tests, ... (ufs-community#126)

* Move ccpp_prebuild from build scripts to CMakeLists.txt
* Add find_package(Python) to CMakeLists.txt
* Update to compile scripts and top-level CMakeLists.txt for not specifying suites for CCPP
* Update of regional tests to reflect changes in suite definition files
* Use cmake 3.16.1 on hera
* Require cmake 3.15 or later in top-level CMakeLists.txt
* Bugfix in tests/fv3_conf/ccpp_regional_run.IN
* Python 3 compatibility for tests/abort_dep_tasks.py

Co-authored-by: Dusan Jovic <dusan.jovic@noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants