SRW App Users' Guide (close-to-final): corrections and notes (not meant to be merged, but to be picked up by the documentation team)#126
Conversation
91bf763 to
2328cb1
Compare
2328cb1 to
5ed71b8
Compare
|
@climbfuji Thanks for your detailed comb through of the documentation. I am working on addressing the comments. We will need to meet as a team to talk through some of them (I will try to get this scheduled tomorrow afternoon) and then you will see a new PR. |
| NOTE DOM 20220227 - This /apps directory doesn't exist on jet (haven't checked hera) | ||
|
|
||
| .. code-block:: console | ||
|
|
There was a problem hiding this comment.
The /apps directory is only for Orion (Jet and Hera use /contrib). The Gaea block can be changed to:
module use /lustre/f2/pdata/esrl/gsd/contrib/modulefiles
module load rocoto/1.3.3
module load miniconda3/4.8.3-regional-workflow
There was a problem hiding this comment.
Please check this Gaea information one more time. This is what is set for the workflow, but the python requirements for the plotting scripts are a bit different. For example, not sure why we would need module load rocoto/1.3.3 here.
| NOTE DOM 20220227 - why is this not in a shared, user-independent directory? We have /work/noaa/nems/emc.nemspara and /work/noaa/nems/glopara, the latter already hosts the fixed files etc. A conditionn for tier-1 platforms was that no data/scripts/... resides in user directories. This disqualified stampede from being a tier-1 platform. | ||
|
|
||
| .. code-block:: console | ||
|
|
There was a problem hiding this comment.
We should move NaturalEarth on Orion to:
/work/noaa/global/glopara/fix/
I can send an email to Kate.
There was a problem hiding this comment.
I just moved the NaturalEarth files on Hera to a user-independent directory. Line 47 can be changed to:
/scratch2/BMC/det/UFS_SRW_app/v1p0/fix_files/NaturalEarth
There was a problem hiding this comment.
I have updated to documentation to both of these paths so we need to make sure they are moved appropriately (talk to Kate about Orion).
|
|
||
| NOTE DOM 20220227 - this script will *not* work on gaea for sure. First of all because the module load commands are wrong (see above, second because gaea uses #SBATCH --clusters, third because you cannot do module purge and module load hpss). The line ". /apps/lmod/lmod/init/sh" doesn't work on gaea, it doesn't work on jet either. | ||
|
|
||
| .. code-block:: console |
There was a problem hiding this comment.
Given the major differences between Gaea and Jet/Hera/Orion, and the current time crunch, I would suggest we list the sbatch script as not currently functional on Gaea in the known limitations document.
There was a problem hiding this comment.
I changed the wording to the following: (Thoughts? Comments?)
If the Python scripts are being used to create plots of multiple forecast lead times and forecast
variables, then they should be submitted to the batch system using either something like the sq_job.sh
or sq_job_diff.sh script (for a platform such as Hera that uses the slurm
job scheduler) or the qsub_job.sh or qsub_job_diff.sh script (for a platform such as
Cheyenne that uses PBS or PBS Pro as the job scheduler). Examples of these scripts are located under
ufs-srweather-app/regional_workflow/ush/Python and can be used as a starting point to create a batch script
for your platform/job scheduler of use. The scripts must be submitted using the command appropriate
for the job scheduler used on your platform. For example, on Hera,
sq_job.sh can be submitted as follows:
.. code-block:: console
sbatch sq_job.sh
On Cheyenne, qsub_job.sh can be submitted as follows:
.. code-block:: console
qsub qsub_job.sh
|
|
||
| NOTE DOM 20220227 on jet at least, rocotorun is not in the path, not even after running "source env/build_jet_intel.env" and "source ../../env/wflow_jet.env"; someone needs to check that on all platforms and insert a code block here to load the rocoto module. | ||
|
|
||
| .. code-block:: console |
There was a problem hiding this comment.
The build and wflow environment files are not supposed to load the Rocoto module. That is done on a platform-by-platform basis by the launch_FV3LAM_wflow.sh script:
if [ "$MACHINE" = "CHEYENNE" ]; then
module use -a /glade/p/ral/jntp/UFS_SRW_app/modules/
module load rocoto
elif [ "$MACHINE" = "ORION" ]; then
module purge
module load contrib rocoto
elif [ "$MACHINE" = "WCOSS_DELL_P3" ]; then
module purge
module load lsf/10.1
module use /gpfs/dell3/usrx/local/dev/emc_rocoto/modulefiles/
module load ruby/2.5.1 rocoto/1.2.4
elif [ "$MACHINE" = "WCOSS_CRAY" ]; then
module purge
module load xt-lsfhpc/9.1.3
module use -a /usrx/local/emc_rocoto/modulefiles
module load rocoto/1.2.4
elif [ "$MACHINE" = "GAEA" ]; then
module use /lustre/f2/pdata/esrl/gsd/contrib/modulefiles
module load rocoto/1.3.3
else
module purge
module load rocoto
fi
There was a problem hiding this comment.
So, based on this we have two options. I can add in these module loads for each platform or we can remove the */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && /glade/p/ral/jntp/tools/rocoto/rocoto-1.3.1/bin/rocotorun -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10 (which happens to be what I use all the time!) and just leave */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && ./launch_FV3LAM_wflow.sh
There was a problem hiding this comment.
BTW, have you considered explicitly listing Hera and Jet in that list rather than the just leaving it in the else block? Not a big deal for this release, but may be something to consider for the future.
There was a problem hiding this comment.
@jwolff-ncar, you're correct. Hera and Jet should be explicitly listed. We can add that to our list of post-release changes.
Can we just add a caveat to the rocotorun/rocotoboot commands listed and say (the Rocoto module must be in your path)? Again, these commands (and the crontab example) are only necessary if the user does not use the launch script or the crontab option in config.sh.
There was a problem hiding this comment.
I added in the module loads for the different platforms for now. If there is disagreement we can discuss tomorrow.
|
|
||
| NOTE DOM 20220227 rocotorun will not be in the path unless the user sets PATH=... in the crontab | ||
|
|
||
| .. code-block:: console |
There was a problem hiding this comment.
The goal of the launch script is so the user doesn't need to worry about whether rocotorun is in their path when running from the command line. In addition, if the user sets USE_CRON_TO_RELAUNCH in the config.sh, they don't need to worry about setting anything in the cron, since the cron points to the launch script (which loads the Rocoto module).
There was a problem hiding this comment.
To avoid confusion here, I will just list */3 * * * * cd /glade/p/ral/jntp/$USER/expt_dirs/test_CONUS_25km_GFSv15p2 && ./launch_FV3LAM_wflow.sh as the only option for cron here. That should avoid this problem.
|
|
||
| NOTE DOM 20220227 would it be good to mention the number of horizontal grid points NX x NY in this table/section? Or any other information (min/max lat/lon etc)? or a plot of the three domains? Or provide a link to section 6.1? | ||
|
|
||
| Case-specific Configuration |
There was a problem hiding this comment.
Since the user can create their own grid, I don't think there's a reason to point them to the pre-defined grids here.
There was a problem hiding this comment.
This section is just meant to provide a general overview of the utilities.
There was a problem hiding this comment.
I added a pointer to chpt 6. That hopefully suffices.
|
|
||
| NOTE DOM 20220227 need to double check, beforehamnd expt_dirs sits in ufs-srweather-app/../expt_dirs, i.e. homerrfs would be ufs-srweather-app | ||
|
|
||
| ``EXPT_SUBDIR``: (Default: “”) |
There was a problem hiding this comment.
${HOMErrfs}=ufs-srweather-app/regional_workflow, therefore we need to change ${HOMErrfs}/../expt_dirs to ${HOMErrfs}/../../expt_dirs.
|
Comments addressed in #127 |
…te regional tests, ... (ufs-community#126) * Move ccpp_prebuild from build scripts to CMakeLists.txt * Add find_package(Python) to CMakeLists.txt * Update to compile scripts and top-level CMakeLists.txt for not specifying suites for CCPP * Update of regional tests to reflect changes in suite definition files * Use cmake 3.16.1 on hera * Require cmake 3.15 or later in top-level CMakeLists.txt * Bugfix in tests/fv3_conf/ccpp_regional_run.IN * Python 3 compatibility for tests/abort_dep_tasks.py Co-authored-by: Dusan Jovic <dusan.jovic@noaa.gov>
This PR contains corrections (that should be reviewed but can be accepted as-is, and many notes (starting with
NOTE DOM 20220227) in the users' guide rst files.My suggest is for the documentation team to check out this PR (up to date with release/public-v1), review my corrections and accept/modify/reject, then look at all
NOTE DOM 20220227notes in the different*.rstfiles and address/fix or decide to ignore, but in any case remove my comment. The resulting version of the documentation can then be used as the final documentation update PR for the app.First and foremost, overall the documentation looks great and I could see the tremendous amount of work that went into it. However, there are a few issues that I think need to be addressed.
There are several broken references/links, one table doesn't show in the html, some paths are wrong, sometimes platforms are forgotten or have wrong information on where data is located etc.
Chapters w/o corrections doesn't mean I didn't read them, I just didn't find anything wrong (cudos to the authors).
The weakest chapter with most mistakes is chapter 10,
Graphics.rst. Many of the instructions are wrong, because the modules that should be loaded do not exist (not even the top-level directory/appon several systems). I left specific notes for gaea, too. It seems to me that nobody actually tried if these instructions worked. Whoever revisits this chapter please make sure to follow all the steps on all platforms, if necessary by working with someone who has access to a specific platform. I can help with gaea, for instance. It might be better to ditch the job submission scripts in this chapter 10 and in directoryush/Python/altogether and just tell people to run on the login node or to launch an interactive job (refer users to the HPC documentation pages) and run it there. Unless someone has time to test that the job submission scripts work on all platforms as intended.