Migrate to mercury for globus/hpss transfers from MSU#3655
Conversation
|
All tests passed on Hercules. However, it was noted that running more than 2-3 cases at the same time bogged down globus and resulted in some unstable behavior in the scripts. I have minimized the issues on the workflow side, but additional work will likely be necessary in Sven to more gracefully handle incomplete transfers. Noting that Nathan Gregg suggested that we use Globus workflows instead of Sven, which would likely alleviate these kinds of issues. That would involve a complete rewrite and significant research, however. That said, I think this PR is ready to merge. |
There was a problem hiding this comment.
Pull Request Overview
This PR migrates the HPSS transfer functionality from Niagara to Mercury by updating command-line arguments, configuration settings, and documentation. Key changes include:
- Updating all references from Niagara to Mercury in comments, error messages, and configuration.
- Modifying globus transfer command setups and removing SSH username auto-detection in favor of configuration values.
- Adjusting host detection and dependency names in workflow scripts to align with the Mercury migration.
Reviewed Changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| ush/python/pygfs/task/globus_hpss.py | Updated commands, error messages, and comments to reference Mercury. |
| ush/python/pygfs/task/archive.py | Added debug logging for YAML output. |
| scripts/exglobal_globus_earc.py | Comment updates to use Mercury instead of Niagara. |
| scripts/exglobal_globus_arch.py | Comment updates to use Mercury instead of Niagara. |
| parm/globus/.j2, parm/config/gfs/ | Updated configuration and script templates for Mercury. |
| docs/source/*.rst | Documentation updates for Mercury-based transfers. |
| dev/workflow/* | Adjustments in host detection, workflow dependencies, and logging to support Mercury. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| host_plus_inputs_dict = AttrDict(host.info, **inputs_dict_remapped) | ||
| host_plus_inputs_dict.HOMEgfs = _top | ||
| host_plus_inputs_dict.MACHINE = host.machine.upper() | ||
| host_plus_inputs_dict.MACHINE = str(host).upper() |
There was a problem hiding this comment.
This doesn't work. It looks for an upper method in the Host class, so str() is required first.
| specs.partition = partition | ||
| specs.native = native | ||
| specs.machine = str(host) | ||
| specs.machine = host |
There was a problem hiding this comment.
Are we sure this is not giving specs.machine the host object as opposed to the name of the machine? Would it be explicitly clear if this were
specs.machine = host.machine?
There was a problem hiding this comment.
That's fine by me. __str__ is there for convenience and probably shouldn't be abused. Either str(host) or host.machine. I'm fine with the latter here since this is a piecewise copy of host information.
There was a problem hiding this comment.
To answer your first question, no, I'm not sure, because my test was to print(specs.machine), which, if it assigned the host object would have called the __str__ method, so it was a faulty test.
There was a problem hiding this comment.
As an additional test:
specs.machine = host
print(specs.machine.info){'DMPDIR': '/work/noaa/rstprod/dump', 'BASE_GIT': '/work2/noaa/global/role-global/git', 'BASE_DATA': '/work2/noaa/global/role-global/data', 'BASE_IC': '/work2/noaa/global/role-global/data/ICSDIR', 'AERO_INPUTS_DIR': '/work2/noaa/global/role-global/data/GEFS_ExtData/20250310', 'PACKAGEROOT': '/work2/noaa/global/role-global/nwpara', 'HOMEDIR': '/work2/noaa/global/${USER}', 'STMP': '/work2/noaa/stmp/${USER}/${machine^^}', 'PTMP': '/work2/noaa/stmp/${USER}/${machine^^}', 'NOSCRUB': '${HOMEDIR}', 'COMINsyn': '/work2/noaa/global/role-global/com/gfs/prod/syndat', 'COMINecmwf': '/work2/noaa/global/role-global/data/external_gempak/ecmwf', 'COMINnam': '/work2/noaa/global/role-global/data/external_gempak/nam', 'COMINukmet': '/work2/noaa/global/role-global/data/external_gempak/ukmet', 'SCHEDULER': 'slurm', 'QUEUE': 'batch', 'PARTITION_BATCH': 'hercules', 'PARTITION_SERVICE': 'service', 'HPSS_PROJECT': 'emc-global', 'ARCHCOM_TO': 'globus_hpss', 'ATARDIR': '/NCEPDEV/${HPSS_PROJECT}/1year/${USER}/${machine}/scratch/${PSLOT}', 'CHGRP_RSTPROD': 'YES', 'CHGRP_CMD': 'chgrp rstprod', 'CLIENT_GLOBUS_UUID': '869912fe-f6de-46c0-af10-b22efd84a022', 'SUPPORTED_RESOLUTIONS': ['C1152', 'C768', 'C384', 'C192', 'C96', 'C48'], 'DO_ARCHCOM': 'NO', 'DO_AWIPS': 'NO', 'MAKE_NSSTBUFR': 'NO', 'MAKE_ACFTBUFR': 'NO'}So it was assigning the host object.
aerorahul
left a comment
There was a problem hiding this comment.
thanks for making the changes and porting this capability to Mercury.
lgtm
|
I have notified George Fekete of the issues I was seeing with Sven on Mercury and he will be investigating it. The ticket number is RDHPCS#2025043054000066. |
* develop: Move parm/config/sfs/config.globus to dev/parm/config/sfs (NOAA-EMC#3697) Add Fit2Obs to modulefiles/module_base.noaacloud.lua (NOAA-EMC#3695) Update AWS defaults for running obs prep jobs on cloud (NOAA-EMC#3681) Add GCAFS forecast-only mode to the workflow Adds marine DA ensstat files to archiving (NOAA-EMC#3631) Migrate to mercury for globus/hpss transfers from MSU (NOAA-EMC#3655) STY: Remove empty __init.py__ files. (NOAA-EMC#3691) Relocate config templates to `dev/` space in prep for EE2 (NOAA-EMC#3684)
Description
This migrates the doorman service from Niagara to Mercury as the former is being shut off soon.
Resolves #3490
Resolves #3539
Type of change
Change characteristics
How has this been tested?
Cycled testing on Hercules.
Checklist