Skip to content

Add central RunOutputRoot class for deriving CI run output paths on S3#3000

Closed
ScottTodd wants to merge 18 commits into
ROCm:mainfrom
ScottTodd:run-outputs
Closed

Add central RunOutputRoot class for deriving CI run output paths on S3#3000
ScottTodd wants to merge 18 commits into
ROCm:mainfrom
ScottTodd:run-outputs

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Jan 20, 2026

This PR introduces a new RunOutputRoot class that centralizes path construction, clearly documents which files are uploaded where, and adds many new unit tests.

Motivation

We had multiple files constructing paths on S3 for CI outputs such as artifacts, logs, and reports. Each of these files needed to know what schema to follow and there was risk of the files not agreeing on conventions. Several files even used the retrieve_bucket_info() function which just returned a tuple of [EXTERNAL_REPO, BUCKET].

This is prerequisite work for:

Technical Details

Key changes:

  1. Added new build_tools/_therock_utils/run_outputs.py file with a new RunOutputRoot class

  2. Migrated existing code to use the new class, like so:

    - analysis_url = f"{bucket_url}/logs/{artifact_group}/build_time_analysis.html"
    + analysis_url = run_root.build_time_analysis_url(artifact_group)
    - s3_prefix = f"{external_repo}{run_id}-{platform}"
    + s3_prefix = run_root.prefix
  3. Moved retrieve_bucket_info() from github_actions_utils.py to the new file as an implementation detail

  4. Added new documentation for the directory structure of workflow run outputs in docs/development/run_outputs_layout.md

  5. Added new unit tests

See also:

Test Plan

  • Added new unit tests and updated existing unit tests
  • Watch CI for unexpected file paths, broken logs, errors, etc.

Note

I also want better ways to test these sorts of changes, so I added some follow-up plans in my plan/notes doc at Follow-up Work: Local Testing & Backend Abstraction. The main ideas are to

  1. Add a --dry-run option to post_build_upload.py
  2. Refactor uploading to use a "backend" abstraction that can choose between S3 and local, so a developer can "upload" a set of local files into a local folder with the same directory structure as the CI would upload to S3

Test Result

Submission Checklist

ScottTodd and others added 15 commits January 19, 2026 11:08
Introduces RunOutputRoot dataclass as the single source of truth for
computing paths to all outputs from a GitHub Actions workflow run.
This consolidates duplicated path logic scattered across multiple files.

The module documents the complete layout structure for:
- Build artifacts (.tar.xz, .tar.zst)
- Logs and log indexes
- Manifests (therock_manifest.json)
- Future: Python packages, native packages, reports

Changes:
- Add build_tools/_therock_utils/run_outputs.py with RunOutputRoot class
- Factory methods: from_workflow_run() and for_local()
- Path computation for all output types (artifacts, logs, manifests, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replaces manual S3 path construction with the new RunOutputRoot class,
consolidating all path computation into a single source of truth.

Changes:
- Update post_build_upload.py to import and use RunOutputRoot
- Change upload functions to take RunOutputRoot instead of bucket_uri strings
- Replace f-string path construction with RunOutputRoot method calls
- Add artifact_index_s3_uri() method to RunOutputRoot for upload paths
- Change from wildcard import to explicit imports from github_actions_utils

This makes the path layout easier to understand and modify. Future layout
changes (e.g., moving artifacts to a subdirectory) only require updating
RunOutputRoot methods.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Refactors S3Backend and LocalDirectoryBackend to use RunOutputRoot for
path computation, consolidating all path logic into a single source of truth.

Changes:
- S3Backend now takes RunOutputRoot instead of bucket/run_id/platform/external_repo
- LocalDirectoryBackend now takes RunOutputRoot instead of run_id/platform
- LocalDirectoryBackend path pattern fixed to match S3 (removed 'run-' prefix)
- create_backend_from_env creates RunOutputRoot first, then passes to backends
- bucket and s3_prefix are now properties delegating to run_root

This ensures local staging directories use the same layout as S3, making it
easier to test and debug artifact handling locally.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive tests for the new RunOutputRoot class covering:
- Basic properties (prefix, s3_uri, https_url, local_path)
- Artifact path methods (s3_key, s3_uri, index_url)
- Log path methods (prefix, s3_uri, index_url, build_time_analysis_url)
- Manifest path methods (prefix, s3_key, s3_uri, url)
- Future output types (python packages, native packages, reports)
- Factory methods (for_local)
- Immutability verification

Updates artifact_backend_test.py to use the new RunOutputRoot-based
constructor signatures for S3Backend and LocalDirectoryBackend.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documents the S3 directory structure for workflow run outputs including
artifacts, logs, manifests, and future output types (python packages,
native packages, reports).

Covers:
- S3 path structure and naming conventions
- Multi-platform and multi-group organization
- Fork/external repository handling
- RunOutputRoot API reference and usage examples

Co-Authored-By: Claude <noreply@anthropic.com>
Per Python style guide, imports should be at the top of the file.
Add sys.path.insert to enable importing from sibling github_actions
package. This also enables the from_workflow_run tests that were
previously skipped.

Co-Authored-By: Claude <noreply@anthropic.com>
Bug fix:
- from_workflow_run() now passes run_id as workflow_run_id to
  retrieve_bucket_info(), which uses it to fetch workflow data
  from GitHub API when workflow_run dict is not provided.

Test improvements:
- Removed mocks of retrieve_bucket_info - tests now call the real
  function for proper integration testing
- Tests provide workflow_run dicts to control bucket selection
  without making API calls
- Added test for legacy bucket (pre-cutover date workflows)
- Tests now verify actual bucket selection logic: fork detection,
  external repo handling, date-based bucket selection

Co-Authored-By: Claude <noreply@anthropic.com>
Remove unimplemented methods for python packages, native packages,
and reports. These can be added when actually needed.

Keep documentation section explaining how to add new output types,
so the design pattern is clear for future extensions.

Co-Authored-By: Claude <noreply@anthropic.com>
Add new "S3 Buckets" subsection with table of all buckets used for
CI artifacts, including browse links to the public S3 bucket indexes.
Includes both current and legacy buckets.

Co-Authored-By: Claude <noreply@anthropic.com>
…putRoot

- upload_test_report_script.py: Use RunOutputRoot.from_workflow_run()
  instead of retrieve_bucket_info() + manual URI construction

- fetch_artifacts.py: Remove BucketMetadata class (duplicated
  RunOutputRoot functionality), use RunOutputRoot directly

This consolidates all bucket/path computation through RunOutputRoot,
leaving retrieve_bucket_info() as an internal implementation detail
used only by run_outputs.py.

Co-Authored-By: Claude <noreply@anthropic.com>
Move retrieve_bucket_info() from github_actions_utils.py into
run_outputs.py as a private _retrieve_bucket_info() function.
This makes RunOutputRoot the single source of truth for all
S3 path and bucket computation.

Changes:
- Move _retrieve_bucket_info() into run_outputs.py
- Delete retrieve_bucket_info() from github_actions_utils.py
- Migrate tests to run_outputs_test.py (test through public API)
- Add RELEASE_TYPE env var tests for nightly/release buckets
- Add integration tests with @skipUnless(GITHUB_TOKEN)
- Keep github_actions_utils.py for generic GitHub API code only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update Related Files table to reflect that github_actions_utils.py
now contains generic GitHub API utilities, not retrieve_bucket_info()
which was moved to run_outputs.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Tests were using the old BucketMetadata and LocalDirectoryBackend interfaces
that were updated in the RunOutputRoot migration.

Changes:
- fetch_artifacts_test.py: Replace BucketMetadata with RunOutputRoot
- artifact_manager_tool_test.py: Update LocalDirectoryBackend calls to use
  RunOutputRoot instead of separate run_id/platform parameters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread build_tools/tests/run_outputs_test.py Outdated
ScottTodd and others added 3 commits January 20, 2026 08:51
Integration tests now use the _skip_unless_authenticated_github_api_is_available
decorator instead of checking only for GITHUB_TOKEN. This allows the tests to
run when authenticated via the gh CLI, matching the pattern in
github_actions_utils_test.py.

Changes:
- Import is_authenticated_github_api_available from github_actions_utils
- Add _skip_unless_authenticated_github_api_is_available decorator
- Update all 5 integration tests to use the new decorator
- Update class docstring to reflect both auth methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
ScottTodd added a commit that referenced this pull request Jan 29, 2026
…ls (#3093)

## Motivation

Progress on #2608.

These scripts have utility on their own, but I'm intending for them to
be building blocks on top of which we can:
* Download or inspect artifacts for commits across a range 
* That will be useful for certain bisect workflows, diffing artifact
size across commits, etc.
* Find recent artifacts on a branch (useful for bootstrapping source
builds)
* I'm envisioning a related `find_baseline_artifacts_for_commit.py` that
would include more logic like finding the base ref for a branch, or
finding artifacts sharing fingerprints with the current source/build
tree

## Technical Details

* The `ArtifactRunInfo` dataclass in
`build_tools/find_artifacts_for_commit.py` duplicates a substantial
amount of logic. I'm working on a deeper refactoring there (see a draft
at #3000).
* TBD how these scripts will work with rocm-libraries, rocm-systems, and
other repositories. I want to start with a focus on just TheRock. I had
some code for inferring the repository, workflow file, default branch,
etc. from the environment but I need to think more about how those
repositories should interface with artifacts considering they currently
build only a subset of subprojects.
* See also this related PR: #2997
* PR self-review:
https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/local_008_artifacts-for-commit.md

## Test Plan

* Added new unit tests
* Tested manually (see below)

## Test Result

> [!NOTE]
> The logging is verbose right now. I want to move the `Retrieving
bucket info` blocks behind a `-v` flag, which may involve switching more
of our python files to using the [`logging`
module](https://docs.python.org/3/library/logging.html).

```
python build_tools\find_artifacts_for_commit.py --commit=62bc1eaa02e6ad1b49a718eed111cf4c9f03593a --artifact-group=gfx110X-all
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 20384488184
  head_github_repository      : ScottTodd/TheRock
  is_pr_from_fork             : True
Retrieved bucket info:
  external_repo: ROCm-TheRock/
  bucket       : therock-ci-artifacts-external
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          62bc1ea
  Git commit URL:      62bc1ea
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     20384488184
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/20384488184
  Workflow run status: completed (failure)
  S3 Bucket:           therock-ci-artifacts-external
  S3 Path:             ROCm-TheRock/20384488184-windows/
  S3 Index:            https://therock-ci-artifacts-external.s3.amazonaws.com/ROCm-TheRock/20384488184-windows/index-gfx110X-all.html
```

```
python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --max-commits 10 --verbose
Searching 10 commits on ROCm/TheRock/main...
  [1/10] Checking 6cdf31c...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21368296710
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [2/10] Checking 901d88d...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367803869
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [3/10] Checking ead209c...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367787575
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [4/10] Checking 42b464c...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367297783
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [5/10] Checking ed34008...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364484487
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [6/10] Checking 46a74db...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364198017
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    Found artifacts: run 21364198017
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          46a74db
  Git commit URL:      46a74db
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     21364198017
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/21364198017
  Workflow run status: queued
  S3 Bucket:           therock-ci-artifacts
  S3 Path:             21364198017-windows/
  S3 Index:            https://therock-ci-artifacts.s3.amazonaws.com/21364198017-windows/index-gfx110X-all.html
```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
ScottTodd added a commit that referenced this pull request Feb 3, 2026
…er (#3019)

## Motivation

While working on #3000, I found that
this code is directly handling S3 paths like so:

```python
    def __post_init__(self):
        self.s3_key_path = f"{self.external_repo}{self.workflow_run_id}-{self.platform}"
``` 

I'm trying to move that code to a central location and to make it
possible to swap out the "s3 cloud backend for a "local backend" for
easier testing. Moving string manipulation code is easy enough, but this
file _also_ directly communicated with S3:

```python
          log(f"++ Downloading {artifact_key} to {output_path}")
          with open(output_path, "wb") as f:
              s3_client.download_fileobj(bucket, artifact_key, f)
```

That would be harder to refactor... if we didn't already have this
abstraction in `ArtifactBackend`. So, I'm first migrating this code to
use that class.


## Technical Details

I also considered making `fetch_artifacts.py` just call
`artifact_manager.py` directly, but they have different roles:
* `artifact_manager.py` is topology-aware and is currently just used for
multi-arch builds
* `fetch_artifacts.py` has regex-based fetching and is currently used by
developers to fetch explicit lists of artifacts and by CI workflows to
download subsets of artifacts for tests

## Test Plan

* Existing unit tests (light coverage)
* Manual sanity check:
    ```cmd
    python ./build_tools/fetch_artifacts.py ^
      --run-id=18480879899 ^
      --artifact-group=gfx110X-dgpu ^
      --output-dir=%HOME%\.therock\18480879899\artifacts
    ```
* CI test jobs will run this `fetch_artifacts.py` script indirectly via
[`build_tools/install_rocm_from_artifacts.py`](https://github.com/ROCm/TheRock/blob/main/build_tools/install_rocm_from_artifacts.py)

## Test Result

* CI jobs run the script with no issues, e.g.
https://github.com/ROCm/TheRock/actions/runs/21192582915/job/60969434096?pr=3019#step:6:280

    ```
    Calling fetch_artifacts_main with args:
--run-id 21192582915 --artifact-group gfx94X-dcgpu --output-dir build
--flatten core-hipinfo_run core-runtime_run core-runtime_lib sysdeps_lib
base_run base_lib amd-llvm_run amd-llvm_lib core-hip_lib core-hip_dev
core-ocl_lib core-ocl_dev rocprofiler-sdk_lib host-suite-sparse_lib
blas_lib blas_test
    
    ...
    
++ Flattening 'blas_test_gfx94X-dcgpu.tar.xz' to
'blas_test_gfx94X-dcgpu'
++ Flattening 'blas_lib_gfx94X-dcgpu.tar.xz' to 'blas_lib_gfx94X-dcgpu'
    Retrieved artifacts for run ID 21192582915
    ```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
ScottTodd added a commit that referenced this pull request Feb 4, 2026
…3136)

## Motivation

Progress on #1559.

For CI workflows, the
[`build_portable_linux_python_packages.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_python_packages.yml)
and
[`build_windows_python_packages.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_windows_python_packages.yml)
workflows which run on CPU machines do not currently upload the packages
that they build, so there is no connection between them and
[`test_rocm_wheels.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/test_rocm_wheels.yml)
which runs on GPU machines.

This adds a script that will upload built Python packages and an
index.html page usable with `pip install --find-links` to bridge that
gap.

## Technical Details

Uploading will reuse the existing CI artifact buckets (typically
`therock-ci-artifacts` or `therock-ci-artifacts-external`), so we'll
have a single subdirectory per workflow run ID in those buckets that
contains:

File type | Status
-- | --
artifacts | Already uploaded
artifact build logs | Already uploaded
artifact build reports | Already uploaded
manifest files | Already uploaded
ROCm Python packages | *Adding in this PR*
PyTorch Python packages | Future
Native Linux packages | Future

The specific layout here is currently

```
S3 Layout:
  {bucket}/{external_repo}{run_id}-{platform}/python/{artifact_group}/
    *.whl, *.tar.gz   # Wheel and sdist files
    index.html        # File listing for pip --find-links
```

I have plans to centralize and document such output layout information
in #3000.

> [!NOTE]
> The
[`release_portable_linux_packages.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/release_portable_linux_packages.yml)
and
[`release_windows_packages.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/release_windows_packages.yml)
workflows have been building python packages and uploading them directly
to the `therock-{release_type}-python` buckets.
>
> As part of future workflow alignment we could reuse these workflows
and then _copy_ packages from the "artifacts" buckets to those "release"
buckets.

## Test Plan

Tested locally in various configurations. Will integrate into workflows
next.

Could design some unit tests for this too?

## Test Result

```bash
# Prep: download artifacts and build python packages
python ./build_tools/fetch_artifacts.py \
  --run-id=21440027240 \
  --artifact-group=gfx110X-dgpu \
  --output-dir=%HOME%/.therock/21440027240/artifacts
python ./build_tools/build_python_packages.py \
  --artifact-dir=%HOME%/.therock/21440027240/artifacts \
  --dest-dir=%HOME%/.therock/21440027240/packages
```

```bash
# Test with a local directory
python ./build_tools/github_actions/upload_python_packages.py \
  --packages-dir="%HOME%/.therock/21440027240/packages" \
  --artifact-group=gfx110X-all \
  --run-id=21440027240 \
  --output-dir="%HOME%/.therock/21440027240/upload-test"

find ${HOME}/.therock/21440027240/upload-test
# .../upload-test
# .../upload-test/21440027240-windows
# .../upload-test/21440027240-windows/python
# .../upload-test/21440027240-windows/python/gfx110X-all
# .../upload-test/21440027240-windows/python/gfx110X-all/index.html
# .../upload-test/21440027240-windows/python/gfx110X-all/rocm-7.12.0.dev0.tar.gz
# .../upload-test/21440027240-windows/python/gfx110X-all/rocm_sdk_core-7.12.0.dev0-py3-none-win_amd64.whl
# .../upload-test/21440027240-windows/python/gfx110X-all/rocm_sdk_devel-7.12.0.dev0-py3-none-win_amd64.whl
# .../upload-test/21440027240-windows/python/gfx110X-all/rocm_sdk_libraries_gfx110x_all-7.12.0.dev0-py3-none-win_amd64.whl
```

```bash
# Real upload
python ./build_tools/github_actions/upload_python_packages.py \
  --packages-dir="%HOME%/.therock/21440027240/packages" \
  --artifact-group=gfx110X-all \
  --run-id=21440027240 \
  --bucket therock-artifacts-testing
```

That uploaded
https://therock-artifacts-testing.s3.amazonaws.com/21440027240-windows/python/gfx110X-all/index.html

```bash
pip install rocm[libraries,devel] --pre \
  --find-links=https://therock-artifacts-testing.s3.amazonaws.com/21440027240-windows/python/gfx110X-all/index.html
# NOTE: 5 MB/s download speed (nightly releases from cloudfront get 60 MB/s)
rocm-sdk test
```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
@ScottTodd
Copy link
Copy Markdown
Member Author

I'm redoing this PR, closing the draft.

@ScottTodd ScottTodd closed this Feb 21, 2026
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Feb 21, 2026
ScottTodd added a commit that referenced this pull request Mar 4, 2026
…ws (#3596)

> [!NOTE]
> This is a rewrite of #3000. The
code was written primarily via Claude Code and has gone through several
design iterations over the last ~month, with a few smaller pieces of
work already split out and landed.
> 
> See also:
> * Task file with research, prototypes, notes, etc.:
[`tasks/active/run-outputs-layout.md`](https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/run-outputs-layout.md)
> * Self-review 1:
[`reviews/local_014_run-outputs-locations-2.md`](https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/local_014_run-outputs-locations-2.md)
> * Self-review 2:
[`reviews/local_015_run-outputs-locations-2.md`](https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/local_015_run-outputs-locations-2.md)

## Motivation

We have multiple scripts that directly construct paths on S3 and use the
`aws` CLI tool like so:

```python
def run_aws_cp(source_path: Path, s3_destination: str, content_type: str = None):
    if source_path.is_dir():
        cmd = ["aws", "s3", "cp", str(source_path), s3_destination, "--recursive"]
    else:
        cmd = ["aws", "s3", "cp", str(source_path), s3_destination]
...

def upload_manifest_to_s3(artifact_group: str, build_dir: Path, bucket_uri: str):
    ...
    dest = f"{bucket_uri}/manifests/{artifact_group}/therock_manifest.json"
    log(f"[INFO] Uploading manifest {manifest_path} -> {dest}")
    run_aws_cp(manifest_path, dest, content_type="application/json")
...

def run(args):
    ...
    external_repo_path, bucket = retrieve_bucket_info()
    run_id = args.run_id
    bucket_uri = f"s3://{bucket}/{external_repo_path}{run_id}-{PLATFORM}"
    bucket_url = (
        f"https://{bucket}.s3.amazonaws.com/{external_repo_path}{run_id}-{PLATFORM}"
    )
```

this moves the path computation and uploading code into shared classes:

```python
def upload_manifest(
    artifact_group: str, build_dir: Path, run_root: WorkflowOutputRoot, backend: StorageBackend,
):
    ...
    backend.upload_file(manifest_path, output_root.manifest(artifact_group))

def run(args):
    ...
    # This handles "retrieve bucket info" as an internal implementation detail
    output_root = WorkflowOutputRoot.from_workflow_run(run_id=args.run_id, platform=PLATFORM)
    # This chooses between LocalStorageBackend and S3StorageBackend depending on if staging_dir is set
    backend = create_upload_backend(staging_dir=args.output_dir, dry_run=args.dry_run)
```

The net result is:

* One file (`workflow_outputs.py`) that documents the directory
structure and expected file types that will be uploaded, so upload and
download scripts can share paths
* One file (`storage_backend.py`) that handles uploading to S3 via
`boto3.client` or "uploading" to a local directory, dropping some uses
of the `aws` CLI
* Simpler code and more shared utilities for scripts that want to upload
or download files

## Technical Details

I tried to keep this PR as small as possible, but several of the design
changes required sweeping updates that would be difficult to split
further.

### In scope for this PR

* Defining new interfaces for path and I/O abstractions
* Migrating existing upload scripts for CI logs, artifacts, manifests,
python packages to those abstractions
* Migrating existing download scripts for the same
* Adding tests and docs for the above

### Out of scope for this PR

* Migrating `build_tools/github_actions/upload_test_report_script.py`
* Migrating code from release github actions workflow files into
scripts. This focuses on just CI workflows, not e.g.
https://github.com/ROCm/TheRock/blob/0de4d6b3a84139cf5dd88eca941f3b7651c63a21/.github/workflows/release_portable_linux_packages.yml#L286-L293
* I did start designing `storage_backend.py` so it might be used here
(see `copy_file()` function and some TODOs)
* Moving index HTML generation out of post_build_upload.py
(#3331)
* Project-wide refactoring/cleanup discovered as part of this work
(there was quite a bit that I had to rein in)

### Feature/architecture highlights

(See also the new `docs/development/run_outputs_layout.md` file)

* `WorkflowOutputRoot`: source of truth for CI output paths
* `from_workflow_run()` pulls bucket information from the current CI
context
* `retrieve_bucket_info()` is now an implementation detail of that
class, not something usable from anywhere in the project via
`github_actions_utils.py`
* `from_workflow_run(lookup_workflow_run=True)` allows fetching from
another run's artifacts
* `for_local()` creates a simulated output path on the local filesystem
* Methods like `log_dir()`, `python_packages()` return `OutputLocation`
* `StorageLocation`: Backend-agnostic location (relative path + bucket
metadata)
  * `.s3_uri`, `.https_url`, `.local_path(staging_dir)`
* `StorageBackend`: base class for uploading
* `upload_file()`, `upload_directory()` functions that support filtering
and specifying content types
  * `S3StorageBackend` uses boto3
* `LocalStorageBackend` uses a local filesystem (for unit tests and
local developer use)

All together, these features enable more natural unit testing and local
workflows built on the upload scripts that we already have.

## Test Plan

* New and existing unit tests (+1600 LOC of test code, +800 LOC of
non-test code)
* Watch CI workflows to ensure that logs/artifacts/packages/etc. are
uploaded as expected
* Watch post-submit multi-arch CI and release workflows (or trigger test
runs)

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
jayhawk-commits pushed a commit that referenced this pull request Mar 11, 2026
## Motivation

Bump rocm-systems from 93bc019 to 093b66caa3 (includes fix for hip-tests
issue and revert for mathlib hiprtc issues and revert for rccl-test,
added revert for miopen failures due to PR 653):

Commits:
093b66caa3 (HEAD, origin/develop, origin/HEAD) Revert "SWDEV-546177 -
hipModuleGetLoadingMode API impl (#653)" (#3858)
d8a0adbc9f [AMD-SMI] Hide libamd_smi.so internal symbols (#3777)
d4da458f94 [rocprofiler-sdk] [Documentation ] Updating changelog (#3827)
19fadeb082 (origin/users/abchoudh/fix_dispatch_count) [RCCL][Tuner
Plugin] Enable tuning of RCCL tuning constants (#3757)
b4f5f8a6a8 rocr: Fix IPC dmabuf hang with large allocations (#3211)
64efea0435 RCCL: allow users to override max and per job memory & fix
defaults. (#3797)
9b3dd101bb Removing ready_for_review (#3849)
7e43880a64 [rocprofiler-systems] Update ROCm version to 7.2.0 in CI
workflows for Debian, RedHat, and Ubuntu (#3431)
1fdb6b9827 [rocshmem] add gda/topology unit tests (#3715)
be1ea24a96 Move hipMipmappedArrayGetMemoryRequirements test to common
tests
e4513f04c8 Update amdgpu-windows-interop with latest changes, pal
58aa0bab2ced0cc9ebe8d2d0932db6774feb4e49 2026-03-04(#3773)
b1f964d796 [rocprofiler-compute] Ensure long kernel name fully shows in
compute analyze (#3665)
4dcf1e3ce0 SWDEV-567112 - Replace test names (#3787)
33f5f302e5 ROCM-2428 - fixes hipStreamBatchMemOp invalid operation
checks (#3099)
139f4bfff8 [SWDEV-556456] Align HIP_UUID with rocminfo (#3614)
8e8928544c Reduce buffers alignment to 4 bytes (#3821)
51be29a647 AIRUNTIME-125: Consolidate Windows optimization and debug
flags (#3825)
1407392240 [AMD-SMI] CI: Fix root workflow to use ASIC-specific test
filters (#3807)
63f78a98d7 (origin/users/mcao/fix_rocpdsummary) [ROCM-SMI] Fix DRM
include dirs leaking absolute build paths to consumers (#3808)
caf2f7e1eb [ROCM-186] amd-smi: Add support for a VRAM and GTT tuning
interface (#3636)
a0712d4c2a [TheRock CI] Update projects_to_test lists (#3749)
02090c42c9 rocrtst: install gfx .hsaco files to share/rocrtst (#3744)
4a0a1cbfce Merge other simd table (#3696)
0d07657d78 Add missing kwargs from
rocprofiler_add_integration_validate_test in .cmake-format.yaml (#2336)
3a3df301dc Optimize device counting service GPU interactions (#1583)
95d9da0098 Add SPM Enable flag in build infrastructure (#3677)
12bb9435b2 [rocprofiler-sdk] On-demand GPU profile queue
creation/destruction (#3586)
941057c2c0  Navi4 tuning table iter 1 (#3052)
dbf2b7369f [AMD-SMI] Display N/A for cu_occupancy when file is
unavailable (#3589)
b0efc7c639 [RCCL] [UT] Add ROCTX test (#3625)
ba7a20ea18 Reducing the p2pnChannels for half-subscription A2A on
multi-node MI350 (#3381)
75238c98a2 [clr] Fix memory leak in getOrCreateHostcallBuffer (#3699)
af2ee0e8ad [hip-tests] ASAN Check for image support before we create
context (#3834)
ad4496678e Update windows ci subtree in include amdgpu-windows-interop
(#3814)
c8ad252208 [rocprofiler-register] Fix compilation with system fmt/glog
(#1243)
781881544d Update README to include dbgapi and debug agent components
(#3731)
88e4a7837e ROCProfiler and ROCTracer: Modifying deprecation note (#3831)
b5918a5f35 [ROCM-3124-3125-3126] CUID file generation hangs on MI350
systems/CUID test failures/Segmentation fault in CUID example code
(#3548)
97a5dd993c Update copyright to use SPDX IDs (#3805)
511730ab45 [rocshmem]: add flood-amo tester (#3653)
2d650a0065 [clr] Fix heap use after free error in device allocations
(#3789)
b6b179ad81 Disable hipHostRegister_Negative test for ASAN (#3832)
39ec318c8d [RCCL] Add GDA alltoallv via rocshmem integration (#3613)
fb0f4d53b1 [RCCL] [CUMEM] Fix cuMem multi-process runs (#3811)
c3de7d4bf6 SWDEV-526201 - Fix and enable disabled HIP tests from warp
group (#3089)
8d9a8ca161 roofline: code cleanup and refactor vector types (#3813)
8957e49028 Don't wait on command completion if worker thread is
destroyed (#3790)
9e7586a5fa [rocshmem] Add barrier APIs and expose `ROCSHMEM_TEAM_WORLD`
on device (#3651)
91b09235b0 Revert "fix local gpu release static build failure (#3667)"
(#3799)
0fda754b1b libhsakmt: Add secondary KFD context creation support
ee43db95b0 Revert "Update TheRock reference to 20260303 commit (#3709)"
(#3826)
86e28b9fae Added fix to update GL2C counters instance count for GFX11.5
(#3100)
93f69f7de4 Adjust includes to match use (#3742)
e9fbc3f1a2 (develop) Update TheRock reference to 20260303 commit (#3709)
be0675a1a6 (HEAD) Revert "Support fp8 types in hiprtc (#2605)" (#3792)
3e3a94a4ef [rocprofiler-systems] Add trace_cache support for
std::optional<T> serialization (#3490)
0b42a7f472 clr: Eliminate unnecessary kernel name string copies (#3774)
b6b0d77b29 rocr: Add hsa_amd_memory_async_batch_copy API for batched
memory copies (#3259)
486e6d12d2 Resolve staircase RS regression with 48 max channels (#3684)
eb59c85ac4 [gfx942][gfx950] Leverage new cache bypass builtins for
simple protocol where available (#2847)
4d74d27f0e (origin/users/raramakr/rocm-smi-target) Revert "Auto Labeler:
Add ci:regression-detection label to rccl PRs (#3543)" (#3769)
8f0795517c [AMD-SMI] CI: Use ASIC-specific test blacklists in workflows
(#3775)
7cef5b64c1 Fix MFMA total FLOPS calculation (#3371)
aea37512ba Remove duplicated tests (#3235)
b6c656fdd4 Remove duplicated tests in memory module (#3087)
ca3137d8f9 [rocprofiler-sdk] Install integration tests without building
for therock & Misc. fixes (#3047)
0ab5c41f65 [rdc] Enable on-demand queue mode in rocprofiler-sdk to
prevent inference degradation (#3629)
a1eb2a1f7c rocr/wsl: a library should not output to std::out by default
(#3718)
b7da296cc8 Reenable flood_put/get testers on mlx5 since they should work
after pr2732 (#3748)
000e24de2f [rocprofiler-sdk] Add automatic late-start support to
rocprofiler_force_configure (#2168)
64ea87f592 [hip-tests] Fix memory leaks in hipMemPoolTrimTo tests
(#3643)
543a7d765f rocr: Include code object allocs in lightweight coredump
a58da378d4 [rocdecode] - update rocdecode ctest (#3768)
f88e4ee44d [rocprofiler-systems] Make CDash submit non-fatal and add
GitHub Actions logging (#3525)
cb14debc3a [rocprofiler-systems] Update nlohmann-json submodule (#3391)
449253009a SWDEV-567112 - Introduce new mechanism for tagging and
disabling tests - Part 2 (#3707)
8ca991393d disabling rccl from full build (linux), covered in RCCL CI
(#3770)
c4fdb20b74 [ATT] Re-enable tests. Add option to specify perf to target
CU only (#2819)
615aab95ed ROCM-3816 Out of Memory fix (#3588)
8ffad41b24 Fix rocm_smi64 exporting invalid absolute paths to consumers
(#3717)
042d76a626 rocr: Remove dependency on KFD in Runtime::VMemoryHandleMap
(#2515)
555db59b2a [AMD-SMI] CPU: Added support for family 1A Models 50h-57h
(#3206)
3affa2c7a3 [SWDEV-555935] Fix shared mutex and self-heal (#3729)
ba0bf0f3db Replace hipMemGetInfo with ihipMemGetInfo and use it for
internal calls. (#2845)
c5cef9b18e Fix HIP_RETURN on all HIP API calls. (#2838)
241ce7ba83 Revert "memory: fix "contiguous_bytes" calculation in generic
conversion (#3285)" (#3755)
8a690f482e [kpack/clr] Windows PE/COFF support for kpack artifact
splitting and runtime loading (#3728)
863bdf8aa8 MFMA pre-processor guards for ipc.hip (#3724)
90bb9b1921 Release queue outside of vgpusAccess lock (#3705)
de4523910c clr: Add build support of ROCR and PAL backends together
(#3722)
dfb7abc2d8 [rocprofiler-sdk] RCCL API changes for
RCCL_API_TRACE_VERSION_PATCH = 3 (#3477)
d69d4f23db [AICOMRCCL-633] - Fixed warnings in tests (#3402)
067d86dcaa rocr/wsl: Disable AQL Queue usage with flag ROCR_USE_PM4
(#3663)
594eb60d42 [TheRock CI] rocm-systems build full ROCm stack (#3182)
27d17e8ea0 [ROCProfiler-SDK] Fix SWDEV-556922: Handle comments before
checking for pmc: (#1723)
c80d90439d memory: fix "contiguous_bytes" calculation in generic
conversion (#3285)
669987c83f [hip-tests] ASAN - add missing release handles (#3735)
a24bbd75a4 fix local gpu release static build failure (#3667)
259b2ff913 Speed up DeviceId (#2803)
65d9264bf4 Simplify MPI trace merge logic and remove legacy guards
(#3562)
1076c083cb use system to look for zcat path instead (#3720)
22f1d19db3 [AICOMRCCL-355] Enable threshold-based p2p-batching (#3000)
a2e4c794d2 Partially flatten template tests cases (#2597)
e242abe219 Pass space separated gfx target list to RCCL build command
(#3701)
4f78aea66d SWDEV-570074 - Refactor Memset memory object handling.
(#2228)
b3ad12d834 Support Nvidia build on theRock for HIP-tests (#3335)
a1cf15ea9a Support fp8 types in hiprtc (#2605)
8ef84b0a50 [rocprofiler-systems] Add HPC examples to automated testing
(#3437)
db3a70dfa0 Free memory which was allocated in tests (#3710)
27e6809c7e [rocprofiler-systems]: Fix rhel CI failure on for MPI and UCX
tests (#3700)
0d9aaf59d8 rccl/topo_expl: fix build issue. (#3719)
be04d75765 Fix zcat path used for checking kernel configs (#3423)
cab60a7b27 rocr/thunk/win: Add CU mask support (#3518)
5b3d826c05 [CUMEM] Initial support for cuMem APIs (#2763)
0606ff491f [HIP] [PLAT-194496] Improve Stress_hipMalloc_HighSizeAlloc
reliability (#3550)
05750a77cc fix hip-test name in config (#3716)
33f777f3e9 hsakmt: Remove --high functionality from run_kfdtest.sh
(#2486)
e4c46e3480 Hide the retain under direct dispatch check (#3698)
bfe0ca0279 Add rocprof trace decoder to CI tests (#3690)
a769b6f54e [rocSHMEM] Edgar/abstract allocator ipc part1 (#3411)
659fb52243 [AMD-SMI] Fix bugs, improve error handling, and clean up
NIC/switch code (#3654)
0eb26ea571 hsakmt: Fix Import/Export of dmabuf_fd for WSL/Windows
(#3348)
a122936abb [SWDEV-567812] Add UBB power and power_limit fields to
npm_info (#3262)
c3bec090c5 [rocprofiler-sdk][rocprofv3][rocpd] Updates for KFD data
(#340)
7c44d47740 SWDEV-547659 - Remove HIP_VERSION_GITHASH in logs (#448)
74b6487a6a SWDEV-547008 - Documentation fix for function return values
(#463)
af21cd44f1 SWDEV-545553 - Improve clarity and robustness of CALLBACK
unit tests (#546)
180d639044 SWDEV-544900 - Change hip-test test case name (#547)
feeca99950 Doc improvements (#3688)
c1822b6336 ROCprofiler-SDK: deprecation of legacy tools (#3609)
5d7aff8462 Fix rocprof-compute-viewer link (#3459)
0b0b4846f0 AIRUNTIME-129 - Fix Ocl test failures of 2D image with
pitches. (#3584)
ac569b87e0 Fix memory tests config (#3687)
603fe7a5cf [hip-tests] Enable hipMipmappedArrayGetMemoryRequirements
test via cmake
4fad4452d9 [hip] Docs: Updates to some memory management pages
8cc59559fe AICOMRCCL-656 fix memory leak in ncclCommInitRankFunc (#3628)
94a4595a5d Fix missing amd_comgr linkage in pc-sampling integration test
(#3453)
2a68565dce rocrtst: CMAke file: strip xnack/feature suffixes from gfxNum
in build_kernel (#3652)
c3542bfb2b [rocprofv3] Deprecating input text files for counter
collection (#1562)
ff122e7ed7 SWDEV-573073 - Cleanup hipHostAlloc/Malloc/Register tests
(#3017)
5b1deaf29d SWDEV-567112 - Introduce new mechanism for tagging and
disabling tests - Part 1 - Core (#2351)
6e0cc309e1 rocrtst: MaxSingleAllocationTest: skip CPU NUMA nodes >0
(#3208)
d65f601195 [AICOMRCCL-667] rccl: Change GDR selection logic. (#3607)
f1c44ab200 Patch Back to Old Repo: fixes from manual runs (#3621)
fe53bcd715 [AMD-SMI] Allow amdsmi init to succeed when no NIC hardware
is present (#3403)
b25600efdb [ROCM SMI] Fix fw pldm version not displayed in default
amd-smi (#3594)
169d2ef763 root to module wiring, remove legacy source collection
(#3482)
7469781988 [LRT][clr] SWDEV-512963-Fix CTS test failures for 1D buffer
copy (#3520)
c8f55d9b86 Adding rocprof trace decoder (#3576)
425e983502 Trace decoder codeowners (#3600)
a176efd648 [hip-tests] Add return statements to HIP_SKIP_TEST (#3647)
32687cf183 rocrtst: CPUAccessToGPUMemoryTest: Cap host allocation to 512
MB under ASAN (#3407)
97c0206753 Update codeowners for thunk DXG (#3334)
be44b28bb6 [rocdecode][rocjpeg] - ctest CMakeLists cleanup (#3632)
80ff0b8942 Various memory leak fixes in hip-tests (#3605)
0988f67a85 fix typo in help text (#3314)
9f823c53f1 Fix CUID file lookup by loading files before searching
entries (#3436)
064c89261b SWDEV-546177 - hipModuleGetLoadingMode API impl (#653)
006213e112 ROCM-2696: Ignare size and base if null ptr (#3336)
6060b99d83 Improve atomic min max test perf (#2580)
3fbcc13602 Change printf capture impl (#1127)
93bc01937c (tag: hip-version_7.12.60610,
origin/users/mradosav-amd/rocprofsys-selective-region) [ROCM-CORE]
Update rdhc script to support rocm install prefix
(ROCm/rocm-systems#3596)

[AICOMRCCL-355]:
https://amd-hub.atlassian.net/browse/AICOMRCCL-355?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant