Skip to content

Migrate to mercury for globus/hpss transfers from MSU#3655

Merged
DavidHuber-NOAA merged 43 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/globus_mercury
May 14, 2025
Merged

Migrate to mercury for globus/hpss transfers from MSU#3655
DavidHuber-NOAA merged 43 commits into
NOAA-EMC:developfrom
DavidHuber-NOAA:feature/globus_mercury

Conversation

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor

Description

This migrates the doorman service from Niagara to Mercury as the former is being shut off soon.

Resolves #3490
Resolves #3539

Type of change

  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

Cycled testing on Hercules.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the system documentation if necessary

DavidHuber-NOAA and others added 28 commits April 4, 2025 15:39
@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review May 8, 2025 20:12
aerorahul
aerorahul previously approved these changes May 12, 2025
Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

All tests passed on Hercules. However, it was noted that running more than 2-3 cases at the same time bogged down globus and resulted in some unstable behavior in the scripts. I have minimized the issues on the workflow side, but additional work will likely be necessary in Sven to more gracefully handle incomplete transfers. Noting that Nathan Gregg suggested that we use Globus workflows instead of Sven, which would likely alleviate these kinds of issues. That would involve a complete rewrite and significant research, however.

That said, I think this PR is ready to merge.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates the HPSS transfer functionality from Niagara to Mercury by updating command-line arguments, configuration settings, and documentation. Key changes include:

  • Updating all references from Niagara to Mercury in comments, error messages, and configuration.
  • Modifying globus transfer command setups and removing SSH username auto-detection in favor of configuration values.
  • Adjusting host detection and dependency names in workflow scripts to align with the Mercury migration.

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ush/python/pygfs/task/globus_hpss.py Updated commands, error messages, and comments to reference Mercury.
ush/python/pygfs/task/archive.py Added debug logging for YAML output.
scripts/exglobal_globus_earc.py Comment updates to use Mercury instead of Niagara.
scripts/exglobal_globus_arch.py Comment updates to use Mercury instead of Niagara.
parm/globus/.j2, parm/config/gfs/ Updated configuration and script templates for Mercury.
docs/source/*.rst Documentation updates for Mercury-based transfers.
dev/workflow/* Adjustments in host detection, workflow dependencies, and logging to support Mercury.

Comment thread dev/workflow/hosts.py
Comment thread dev/workflow/rocoto/workflow_xml.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread dev/workflow/hosts.py Outdated
Comment thread dev/workflow/build_compute.py Outdated
host_plus_inputs_dict = AttrDict(host.info, **inputs_dict_remapped)
host_plus_inputs_dict.HOMEgfs = _top
host_plus_inputs_dict.MACHINE = host.machine.upper()
host_plus_inputs_dict.MACHINE = str(host).upper()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work. It looks for an upper method in the Host class, so str() is required first.

@DavidHuber-NOAA DavidHuber-NOAA requested a review from aerorahul May 14, 2025 14:04
Comment thread dev/workflow/build_compute.py Outdated
specs.partition = partition
specs.native = native
specs.machine = str(host)
specs.machine = host
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is not giving specs.machine the host object as opposed to the name of the machine? Would it be explicitly clear if this were
specs.machine = host.machine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine by me. __str__ is there for convenience and probably shouldn't be abused. Either str(host) or host.machine. I'm fine with the latter here since this is a piecewise copy of host information.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your first question, no, I'm not sure, because my test was to print(specs.machine), which, if it assigned the host object would have called the __str__ method, so it was a faulty test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an additional test:

specs.machine = host
print(specs.machine.info)
{'DMPDIR': '/work/noaa/rstprod/dump', 'BASE_GIT': '/work2/noaa/global/role-global/git', 'BASE_DATA': '/work2/noaa/global/role-global/data', 'BASE_IC': '/work2/noaa/global/role-global/data/ICSDIR', 'AERO_INPUTS_DIR': '/work2/noaa/global/role-global/data/GEFS_ExtData/20250310', 'PACKAGEROOT': '/work2/noaa/global/role-global/nwpara', 'HOMEDIR': '/work2/noaa/global/${USER}', 'STMP': '/work2/noaa/stmp/${USER}/${machine^^}', 'PTMP': '/work2/noaa/stmp/${USER}/${machine^^}', 'NOSCRUB': '${HOMEDIR}', 'COMINsyn': '/work2/noaa/global/role-global/com/gfs/prod/syndat', 'COMINecmwf': '/work2/noaa/global/role-global/data/external_gempak/ecmwf', 'COMINnam': '/work2/noaa/global/role-global/data/external_gempak/nam', 'COMINukmet': '/work2/noaa/global/role-global/data/external_gempak/ukmet', 'SCHEDULER': 'slurm', 'QUEUE': 'batch', 'PARTITION_BATCH': 'hercules', 'PARTITION_SERVICE': 'service', 'HPSS_PROJECT': 'emc-global', 'ARCHCOM_TO': 'globus_hpss', 'ATARDIR': '/NCEPDEV/${HPSS_PROJECT}/1year/${USER}/${machine}/scratch/${PSLOT}', 'CHGRP_RSTPROD': 'YES', 'CHGRP_CMD': 'chgrp rstprod', 'CLIENT_GLOBUS_UUID': '869912fe-f6de-46c0-af10-b22efd84a022', 'SUPPORTED_RESOLUTIONS': ['C1152', 'C768', 'C384', 'C192', 'C96', 'C48'], 'DO_ARCHCOM': 'NO', 'DO_AWIPS': 'NO', 'MAKE_NSSTBUFR': 'NO', 'MAKE_ACFTBUFR': 'NO'}

So it was assigning the host object.

Copy link
Copy Markdown
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making the changes and porting this capability to Mercury.
lgtm

@DavidHuber-NOAA
Copy link
Copy Markdown
Contributor Author

I have notified George Fekete of the issues I was seeing with Sven on Mercury and he will be investigating it. The ticket number is RDHPCS#2025043054000066.

@DavidHuber-NOAA DavidHuber-NOAA merged commit 0e117a9 into NOAA-EMC:develop May 14, 2025
6 checks passed
tsga added a commit to tsga/global-workflow that referenced this pull request May 15, 2025
* develop:
  Move parm/config/sfs/config.globus to dev/parm/config/sfs (NOAA-EMC#3697)
  Add Fit2Obs to modulefiles/module_base.noaacloud.lua (NOAA-EMC#3695)
  Update AWS defaults for running obs prep jobs on cloud (NOAA-EMC#3681)
  Add GCAFS forecast-only mode to the workflow
  Adds marine DA ensstat files to archiving (NOAA-EMC#3631)
  Migrate to mercury for globus/hpss transfers from MSU (NOAA-EMC#3655)
  STY: Remove empty __init.py__ files. (NOAA-EMC#3691)
  Relocate config templates to `dev/` space in prep for EE2 (NOAA-EMC#3684)
@DavidHuber-NOAA DavidHuber-NOAA deleted the feature/globus_mercury branch May 21, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get Niagara/Mercury user names earlier in the setup process when running globus_hpss archiving Migrate Globus server to Mercury

3 participants