Add `find_artifacts_for_commit.py` and `find_latest_artifacts.py` utils by ScottTodd · Pull Request #3093 · ROCm/TheRock

ScottTodd · 2026-01-26T19:33:50Z

Motivation

Progress on #2608.

These scripts have utility on their own, but I'm intending for them to be building blocks on top of which we can:

Download or inspect artifacts for commits across a range
- That will be useful for certain bisect workflows, diffing artifact size across commits, etc.
Find recent artifacts on a branch (useful for bootstrapping source builds)
- I'm envisioning a related find_baseline_artifacts_for_commit.py that would include more logic like finding the base ref for a branch, or finding artifacts sharing fingerprints with the current source/build tree

Technical Details

The ArtifactRunInfo dataclass in build_tools/find_artifacts_for_commit.py duplicates a substantial amount of logic. I'm working on a deeper refactoring there (see a draft at Add central RunOutputRoot class for deriving CI run output paths on S3 #3000).
TBD how these scripts will work with rocm-libraries, rocm-systems, and other repositories. I want to start with a focus on just TheRock. I had some code for inferring the repository, workflow file, default branch, etc. from the environment but I need to think more about how those repositories should interface with artifacts considering they currently build only a subset of subprojects.
See also this related PR: Retrieve latest nightly release by GPU family in install_rocm_from_artifacts.py #2997
PR self-review: https://github.com/ScottTodd/claude-rocm-workspace/blob/main/reviews/local_008_artifacts-for-commit.md

Test Plan

Added new unit tests
Tested manually (see below)

Test Result

Note

The logging is verbose right now. I want to move the Retrieving bucket info blocks behind a -v flag, which may involve switching more of our python files to using the logging module.

python build_tools\find_artifacts_for_commit.py --commit=62bc1eaa02e6ad1b49a718eed111cf4c9f03593a --artifact-group=gfx110X-all
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 20384488184
  head_github_repository      : ScottTodd/TheRock
  is_pr_from_fork             : True
Retrieved bucket info:
  external_repo: ROCm-TheRock/
  bucket       : therock-ci-artifacts-external
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          62bc1eaa02e6ad1b49a718eed111cf4c9f03593a
  Git commit URL:      https://github.com/ROCm/TheRock/commit/62bc1eaa02e6ad1b49a718eed111cf4c9f03593a
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     20384488184
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/20384488184
  Workflow run status: completed (failure)
  S3 Bucket:           therock-ci-artifacts-external
  S3 Path:             ROCm-TheRock/20384488184-windows/
  S3 Index:            https://therock-ci-artifacts-external.s3.amazonaws.com/ROCm-TheRock/20384488184-windows/index-gfx110X-all.html

python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --max-commits 10 --verbose
Searching 10 commits on ROCm/TheRock/main...
  [1/10] Checking 6cdf31c5...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21368296710
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [2/10] Checking 901d88d2...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367803869
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [3/10] Checking ead209c4...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367787575
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [4/10] Checking 42b464c8...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367297783
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [5/10] Checking ed34008b...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364484487
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [6/10] Checking 46a74db4...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364198017
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    Found artifacts: run 21364198017
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          46a74db4c60182387884695c4463a9e69f811c59
  Git commit URL:      https://github.com/ROCm/TheRock/commit/46a74db4c60182387884695c4463a9e69f811c59
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     21364198017
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/21364198017
  Workflow run status: queued
  S3 Bucket:           therock-ci-artifacts
  S3 Path:             21364198017-windows/
  S3 Index:            https://therock-ci-artifacts.s3.amazonaws.com/21364198017-windows/index-gfx110X-all.html

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Tests use real GitHub API calls with pinned historical commits, mocking only the S3 artifact existence check (check_if_artifacts_exist) to avoid brittleness from S3 retention policies. find_latest_artifacts tests also mock get_recent_branch_commits_via_api to control which commits are searched. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Switch from a main + fork commit pair to two consecutive commits on TheRock main, which better represents the real environment where find_latest_artifacts walks continuous branch history. Also restore mock_check parameters required by @mock.patch decorator injection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Keeping these scripts simpler to start. Detection may want to use the `GITHUB_REPOSITORY` env var, but we also only upload partial artifacts for other repositories right now, so TBD how this will actually be used.

Tests the new function with real API calls: - Verifies commits are returned as a list - Validates SHA format (40-char hex) - Tests max_count parameter limits results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ScottTodd · 2026-01-26T20:33:44Z

In #2997, "--latest" currently means "latest nightly release". A name like find_latest_ci_artifacts.py would match the defaults used by this script/module more precisely, but I think we can keep this script applying to both CI and CD workflow runs.

This code can be used with nightly releases like so:

Set the RELEASE_TYPE env var to e.g. nightly for the retrieve_bucket_info() function (this should be improved somehow)

Run with e.g. --workflow release_windows_packages.yml

Like so:

D:\projects\TheRock (artifacts-for-commit) (.venv) λ set RELEASE_TYPE=nightly D:\projects\TheRock (artifacts-for-commit) (.venv) λ python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --workflow release_windows_packages.yml --max-commits 10 --verbose Searching 10 commits on ROCm/TheRock/main... [1/10] Checking c7bc0b40... No workflow run found [2/10] Checking 6cdf31c5... No workflow run found [3/10] Checking 901d88d2... No workflow run found [4/10] Checking ead209c4... No workflow run found [5/10] Checking 42b464c8... No workflow run found [6/10] Checking ed34008b... No workflow run found [7/10] Checking 46a74db4... No workflow run found [8/10] Checking 4cbac571... Retrieving bucket info... (explicit) github_repository: ROCm/TheRock workflow_run_id : 21346230451 head_github_repository : ROCm/TheRock is_pr_from_fork : False (implicit) RELEASE_TYPE: nightly Retrieved bucket info: external_repo: bucket : therock-nightly-artifacts Found artifacts: run 21346230451 Artifact info: Git repository: ROCm/TheRock Git commit: 4cbac571e6d794ef47a2178f5999e9da9368cece Git commit URL: https://github.com/ROCm/TheRock/commit/4cbac571e6d794ef47a2178f5999e9da9368cece Platform: windows Artifact group: gfx110X-all Workflow name: release_windows_packages.yml Workflow run ID: 21346230451 Workflow run URL: https://github.com/ROCm/TheRock/actions/runs/21346230451 Workflow run status: completed (success) S3 Bucket: therock-nightly-artifacts S3 Path: 21346230451-windows/ S3 Index: https://therock-nightly-artifacts.s3.amazonaws.com/21346230451-windows/index-gfx110X-all.html

Is there a reason that RELEASE_TYPE is an env var instead of an argument to the scripts?

In hindsight using the RELEASE_TYPE environment variable so deeply in these scripts was a mistake IMO.

The release workflows already set that environment variable so reading it to determine where to upload logs/artifacts felt sensible in #2046. These scripts (and related usage patterns) are running from outside github actions and they're trying to answer the question "for some historical workflow run, where did it upload files?". All information about that should be discoverable via API calls, but if the github logs or bucket files expire then there isn't much data to work with.

Here's an example to demonstrate:

https://github.com/ROCm/TheRock/actions/runs/21274498502

Commit: 154d5ab

Workflow: release_portable_linux_packages.yml

release_type: nightly

bucket (computed): therock-nightly-artifacts

example artifacts index: https://therock-nightly-artifacts.s3.amazonaws.com/21274498502-linux/index-gfx94X-dcgpu.html

If we don't know that this particular run of release_portable_linux_packages.yml used the nightly release type (it could have been dev), we could either:

Guess, and try each potential bucket

Read the github actions logs, as long as they haven't expired

I think having a common database that connects all the various buckets would make this easier. @marbre might have some ideas too.

With what @HereThereBeDragons is currently is been working on we will indeed end up to have common database which stores information about our releases but that might be limited to nightly releases, not sure about dev builds (Laura?). If it is not in that database this would mean those are artifacts produced via pre-/post-submit, which would be the fallback. Not sure we need an additional database to connect this as well?

SamuelReeder

Looks good to me! Just have some minor comments.

SamuelReeder · 2026-01-26T21:31:27Z

+    parser.add_argument(
+        "--repo",
+        type=str,
+        default="ROCm/TheRock",
+        help="Repository in 'owner/repo' format (default: detect from git remote)",
+    )


I wonder if we should swap the --run-github-repo arg in install_rocm_from_artifacts.py to this same name for consistency. I find it implicit that it's the github repo associated with the run ID.

Sure. We can even provide a list of argument names as aliases for the same option so existing users can continue to use the longer name.

I think this would work:

parser.add_argument( + "--repo", "--run-github-repo", type=str, help="GitHub repository for --run-id in 'owner/repo' format (e.g. 'ROCm/TheRock'). Defaults to GITHUB_REPOSITORY env var or 'ROCm/TheRock'", )

(then update uses to use the first name)

SamuelReeder · 2026-01-26T21:36:10Z

+# TODO: wrap `ArtifactBackend` (or `S3Backend`) class here? Or use `BucketMetadata`?
+#       (we have a few classes tracking similar metadata and reimplementing URL schemes)
+@dataclass
+class ArtifactRunInfo:


Agree that this should be a utility class in a shared file so all the classes can be de-duplicated and use this base.

Changes: - Detect rate limit errors in REST API 403 responses by checking body - Provide actionable guidance: "Authenticate with `gh auth login`..." - find_artifacts_for_commit: Raise GitHubAPIError instead of returning None - find_latest_artifacts: Propagate GitHubAPIError from API calls - CLI main() functions catch GitHubAPIError and exit with code 2 - Add unit tests for rate limit error handling This distinguishes between "no artifacts found" (returns None) and "couldn't check due to error" (raises exception), allowing callers to handle these cases appropriately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

ScottTodd · 2026-01-28T00:05:37Z

I plan on waiting to merge this until after #2997 goes in. I'm expecting some merge conflicts in docs/development/installing_artifacts.md. It would also be nice to get a review from someone else on the devops team but we're pretty overloaded (and Geo is out sick)

ScottTodd · 2026-01-28T18:05:19Z

Rebased. Anyone else want to review before I merge?

HereThereBeDragons

i guess i was too late.. :(

HereThereBeDragons · 2026-01-29T11:17:58Z

could it also return for all archs?
i could imagine a point where you want to get a list of all available archs for a specific commit as you dont know which archs where run successfully or maybe changed happened like for gfx110X-dcgpu -> gfx110X-all

Sort of, possibly. I think where it will matter the most will be for CI runs where we build multiple artifact groups and we want to pick a baseline commit that contains artifacts for all of those artifact groups (if possible), so each job on a single workflow run uses the same baseline.

HereThereBeDragons · 2026-01-29T15:18:37Z

+def _build_artifact_run_info(
+    commit: str,
+    github_repository_name: str,
+    artifact_group: str,


i am just half triggered here in those random choices which parameter is at which position in which function. some uniformity would be nice :)

Some ordering came from the public API, which lists arguments in this order (as all arguments with defaults must come after arguments that do not have defaults):

def find_artifacts_for_commit( # Required, matches function name commit: str, # Always required artifact_group: str, # Has a default github_repository_name: str = "ROCm/TheRock", # Has a default, depends on prior arg workflow_file_name: str = "ci.yml", # Has a default, override is rare platform: str = platform_module.system().lower(), ) -> ArtifactRunInfo | None:

Ordering in the dataclass does not have the same restrictions, so I set it up as:

commit github_repository (for that commit) external_repo (for that repository) platform artifact_group (closely related) workflow_file_name workflow_run_id workflow_run_status workflow_run_conclusion workflow_run_html_url

Need the refactoring in #3000 and #3019 to make this significantly better.

HereThereBeDragons · 2026-01-29T15:31:04Z

+
            if e.code == 403:
+                # Check if this is a rate limit error
+                if "rate limit" in error_body.lower():


why can we not just raise the error_body?

The code before assumed that a 403 error meant that you were using a token and did not have appropriate permissions:

if e.code == 403: raise GitHubAPIError( f"Access denied (403 Forbidden) for {url}. " f"Check if your token has the necessary permissions (e.g., `repo`, `workflow`)." ) from e

I think the extra context here is probably useful, and the original error message is still there (from e).

The error handling paths in here were tricky to test though, since I didn't want to deliberately hit a rate limit.

HereThereBeDragons · 2026-01-29T15:33:54Z

+import unittest
+from unittest import mock
+
+sys.path.insert(0, os.fspath(Path(__file__).parent.parent))


maybe add a short comment what we want to import here? or to which location this path is going?

ScottTodd · 2026-01-30T00:33:58Z

Thanks for the review. I'll follow-up on some of these later. (hopefully nothing too severe that would warrant a revert? let me know)

HereThereBeDragons · 2026-01-30T09:41:50Z

thanks, no no revert needed. mainly cosmetics to be happy long term

ScottTodd and others added 12 commits January 22, 2026 16:27

Add find_artifacts_for_commit.py and find_latest_artifacts.py

0fdfcfc

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Restructure arguments, defaults, and help text

721b52d

Cleanup find_artifacts_for_commit_test

d224b48

Remove code for inferring the github repo name, branch name, etc.

4d15d5d

Keeping these scripts simpler to start. Detection may want to use the `GITHUB_REPOSITORY` env var, but we also only upload partial artifacts for other repositories right now, so TBD how this will actually be used.

Move get_recent_branch_commits_via_api to utils file.

677d35d

Misc script cleanup

7907a0d

Clean/fix-up tests

18b29d7

Fix typos

de2fd9f

Update "installing artifacts" docs to mention new scripts

cb239c7

ScottTodd requested review from SamuelReeder and geomin12 January 26, 2026 19:33

github-project-automation Bot added this to TheRock Triage Jan 26, 2026

github-project-automation Bot moved this to TODO in TheRock Triage Jan 26, 2026

ScottTodd commented Jan 26, 2026

View reviewed changes

ScottTodd mentioned this pull request Jan 26, 2026

Retrieve latest nightly release by GPU family in install_rocm_from_artifacts.py #2997

Merged

1 task

SamuelReeder approved these changes Jan 26, 2026

View reviewed changes

ScottTodd and others added 4 commits January 26, 2026 15:33

Add tip about gh auth login.

52fc322

Trim test cases that were very similar.

7c574f7

Document and warn about limit of 100 per_page results.

71893e2

Merge remote-tracking branch 'upstream/main' into artifacts-for-commit

948ce6f

ScottTodd merged commit e9da873 into ROCm:main Jan 29, 2026
81 of 84 checks passed

ScottTodd deleted the artifacts-for-commit branch January 29, 2026 16:44

github-project-automation Bot moved this from TODO to Done in TheRock Triage Jan 29, 2026

HereThereBeDragons reviewed Jan 29, 2026

View reviewed changes

Conversation

ScottTodd commented Jan 26, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamuelReeder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ScottTodd commented Jan 28, 2026

Uh oh!

ScottTodd commented Jan 28, 2026

Uh oh!

Uh oh!

HereThereBeDragons left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ScottTodd commented Jan 30, 2026

Uh oh!

HereThereBeDragons commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants