Skip to content

Add capability of copying files from lustre to s3 bucket on AWS#2873

Closed
weihuang-jedi wants to merge 86 commits into
NOAA-EMC:developfrom
NOAA-EPIC:aws-copy2bucket
Closed

Add capability of copying files from lustre to s3 bucket on AWS#2873
weihuang-jedi wants to merge 86 commits into
NOAA-EMC:developfrom
NOAA-EPIC:aws-copy2bucket

Conversation

@weihuang-jedi

@weihuang-jedi weihuang-jedi commented Aug 28, 2024

Copy link
Copy Markdown
Contributor

Description

Add capability to allow global-workflow copying file from lustre to s3 bucket on AWS, and other CSPs.

Resolves #2872

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? YES (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

[Wei.Huang@epicweiaws-130 wxflow]$ git diff fsutils.py
diff --git a/src/wxflow/fsutils.py b/src/wxflow/fsutils.py
index af9e5b8..5d6d4da 100644
--- a/src/wxflow/fsutils.py
+++ b/src/wxflow/fsutils.py
@@ -81,6 +81,9 @@ def cp(source: str, target: str) -> None:
if os.path.isdir(target):
target = os.path.join(target, os.path.basename(source))

  • if os.path.isfile(target):
  •    return
    
  • try:
    shutil.copy2(source, target)

(Will file PR, if the general idea is OK with reviewers).

How has this been tested?

  • Clone and build on AWS
  • Run C48 Coupled C48 on AWS

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

weihuang-jedi and others added 30 commits June 18, 2024 23:05
Comment thread scripts/exglobal_atmos_products.sh Fixed
Comment thread scripts/exglobal_atmos_products.sh Fixed
Comment thread scripts/exglobal_atmos_products.sh Fixed
Comment thread scripts/exglobal_atmos_products.sh Fixed
@weihuang-jedi weihuang-jedi marked this pull request as ready for review August 29, 2024 16:08
@weihuang-jedi

Copy link
Copy Markdown
Contributor Author

I understand that this PR is far from perfect.
But want to show a way that we can copy data from /lustre products to s3 bucket.
We want to see how other people think of this, and iterate to a better way.
Thanks,
Wei

@WalterKolczynski-NOAA

Copy link
Copy Markdown
Contributor

Unless there is need to populate the bucket right away, I think this is the wrong approach. It will be annoying to add and maintain, and also injects a bunch of unneeded code into production scripts.

Instead of adding a bunch of code to all of the jobs that write to COM as you've started here with products, we should do it at the end of the cycle, either by piggybacking on the existing archive job or creating a new job similar to the archive job that only runs on AWS. Then all the copying can be done in one go and in one place segregated from everything else.

@aerorahul aerorahul left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @WalterKolczynski-NOAA's assessment and review.

@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as draft September 17, 2024 14:51
@weihuang-jedi

Copy link
Copy Markdown
Contributor Author

@WalterKolczynski-NOAA and @aerorahul,
I was thinking something similar to what Walter said, and only worried that will increase the total wall clock time of Global-Workflow, as if we make the copy work done at the archive stage.
The approach which I took here is an easy one to work, but as Walter said, make a lot of change to the current production code.
let me re-think about this request, and see if I can work out a way similar as archive.
Certainly, any suggestions/comments are more than welcome.

Comment thread workflow/rocoto_viewer.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never update this script in this fashion -- ever.
It is unsupported legacy code that happens to have some vestigial logic on EXPDIR.
These updates are speculative and irrelevant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nice catch. I should not do any change here.
I will remove those, even this PR is not going any where.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Just wanted to keep you up to speed.

@DavidHuber-NOAA

Copy link
Copy Markdown
Member

Opened issue NOAA-EMC/wxflow#42 to add bucket transfer capability to wxflow.

@aerorahul

Copy link
Copy Markdown
Contributor

Closing after consulting w/ @weihuang-jedi
This capability will be coordinated in a future sprint.

@aerorahul aerorahul closed this Nov 12, 2024
@weihuang-jedi weihuang-jedi deleted the aws-copy2bucket branch September 27, 2025 13:16
bbakernoaa pushed a commit to bbakernoaa/global-workflow that referenced this pull request Mar 19, 2026
…ay // PR for Log Warnings NOAA-EMC#2924 // Chore/fix reposync and status checks NOAA-EMC#2873 (NOAA-EMC#2954)

* UFSWM - UFSATM: Convert frestart from statically do dynamically allocated array
* UFSWM - Create scorecard for runtime/memory metrics by machine
* UFSWM - Fixing repo sync check and conditional block logic.
  * UFSATM - Convert frestart from statically do dynamically allocated array
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add copy products to s3 bucket capability on AWS

6 participants