Skip to content

fix: support default credentials for workflow storage#4

Closed
yoosful wants to merge 1 commit into
mainfrom
fix/workflow-storage-default-credentials
Closed

fix: support default credentials for workflow storage#4
yoosful wants to merge 1 commit into
mainfrom
fix/workflow-storage-default-credentials

Conversation

@yoosful
Copy link
Copy Markdown
Owner

@yoosful yoosful commented May 11, 2026

Description

Upstream issue: NVIDIA#749

This change completes the workflow storage DefaultDataCredential path so workflow log/data storage can rely on ambient cloud credentials, including AWS EKS IRSA or Pod Identity, instead of requiring long-lived static access keys.

The Python workflow config models already accept the DataCredential union on current main; this PR aligns the remaining typed call sites and fixes the Go runtime mount path:

  • keep workflow storage/data endpoint maps typed as DataCredential instead of static-only credentials
  • preserve access-key-free workflow log/data credentials in config validation tests
  • pass AWS credential environment to the mount-s3 subprocess instead of mutating the osmo-ctrl process environment
  • preserve ambient AWS auth env for DefaultDataCredential / no explicit credential paths
  • still override ambient env for explicit static credentials and clear stale AWS_SESSION_TOKEN in that static-credential case

Why

Before this fix, a workflow storage config like:

workflow_data:
  credential:
    endpoint: s3://my-bucket/workflow-data
    region: us-west-2

could deserialize as a default credential, but the runtime mount path would unset/overwrite AWS credential environment variables before invoking mount-s3. That breaks default credential chain flows where the pod should use ambient identity from the Kubernetes service account.

Validation

  • bazel test //src/utils/connectors/tests:test_cli_config
  • bazel test //src/utils/job/tests:test_task
  • bazel test //src/service/core/config/tests:test_configmap_loader_unit
  • bazel test //src/runtime/pkg/data:data_test
  • bazel test --runs_per_test=3 --cache_test_results=no //src/runtime/pkg/data:data_test
  • Built a Linux amd64 runtime test binary and ran the AWS credential environment tests from a temporary pod on EKS three times; all runs passed. Temporary validation resources were removed afterward.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@yoosful
Copy link
Copy Markdown
Owner Author

yoosful commented May 11, 2026

Closing as duplicate. NVIDIA#751 was ported and merged upstream as NVIDIA#865, so NVIDIA#749 is already materially handled on upstream main. This PR was opened after checking only open PRs, which missed the closed/ported PR history.

@yoosful yoosful closed this May 11, 2026
@yoosful yoosful deleted the fix/workflow-storage-default-credentials branch May 11, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant