Skip to content

Add find_artifacts_for_commit.py and find_latest_artifacts.py utils#3093

Merged
ScottTodd merged 17 commits into
ROCm:mainfrom
ScottTodd:artifacts-for-commit
Jan 29, 2026
Merged

Add find_artifacts_for_commit.py and find_latest_artifacts.py utils#3093
ScottTodd merged 17 commits into
ROCm:mainfrom
ScottTodd:artifacts-for-commit

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

Motivation

Progress on #2608.

These scripts have utility on their own, but I'm intending for them to be building blocks on top of which we can:

  • Download or inspect artifacts for commits across a range
    • That will be useful for certain bisect workflows, diffing artifact size across commits, etc.
  • Find recent artifacts on a branch (useful for bootstrapping source builds)
    • I'm envisioning a related find_baseline_artifacts_for_commit.py that would include more logic like finding the base ref for a branch, or finding artifacts sharing fingerprints with the current source/build tree

Technical Details

Test Plan

  • Added new unit tests
  • Tested manually (see below)

Test Result

Note

The logging is verbose right now. I want to move the Retrieving bucket info blocks behind a -v flag, which may involve switching more of our python files to using the logging module.

python build_tools\find_artifacts_for_commit.py --commit=62bc1eaa02e6ad1b49a718eed111cf4c9f03593a --artifact-group=gfx110X-all
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 20384488184
  head_github_repository      : ScottTodd/TheRock
  is_pr_from_fork             : True
Retrieved bucket info:
  external_repo: ROCm-TheRock/
  bucket       : therock-ci-artifacts-external
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          62bc1eaa02e6ad1b49a718eed111cf4c9f03593a
  Git commit URL:      https://github.com/ROCm/TheRock/commit/62bc1eaa02e6ad1b49a718eed111cf4c9f03593a
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     20384488184
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/20384488184
  Workflow run status: completed (failure)
  S3 Bucket:           therock-ci-artifacts-external
  S3 Path:             ROCm-TheRock/20384488184-windows/
  S3 Index:            https://therock-ci-artifacts-external.s3.amazonaws.com/ROCm-TheRock/20384488184-windows/index-gfx110X-all.html
python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --max-commits 10 --verbose
Searching 10 commits on ROCm/TheRock/main...
  [1/10] Checking 6cdf31c5...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21368296710
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [2/10] Checking 901d88d2...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367803869
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [3/10] Checking ead209c4...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367787575
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [4/10] Checking 42b464c8...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21367297783
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [5/10] Checking ed34008b...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364484487
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    No workflow run found
  [6/10] Checking 46a74db4...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21364198017
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
Retrieved bucket info:
  external_repo:
  bucket       : therock-ci-artifacts
    Found artifacts: run 21364198017
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          46a74db4c60182387884695c4463a9e69f811c59
  Git commit URL:      https://github.com/ROCm/TheRock/commit/46a74db4c60182387884695c4463a9e69f811c59
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       ci.yml
  Workflow run ID:     21364198017
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/21364198017
  Workflow run status: queued
  S3 Bucket:           therock-ci-artifacts
  S3 Path:             21364198017-windows/
  S3 Index:            https://therock-ci-artifacts.s3.amazonaws.com/21364198017-windows/index-gfx110X-all.html

Submission Checklist

ScottTodd and others added 12 commits January 22, 2026 16:27
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Tests use real GitHub API calls with pinned historical commits, mocking
only the S3 artifact existence check (check_if_artifacts_exist) to avoid
brittleness from S3 retention policies. find_latest_artifacts tests also
mock get_recent_branch_commits_via_api to control which commits are
searched.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Switch from a main + fork commit pair to two consecutive commits on
TheRock main, which better represents the real environment where
find_latest_artifacts walks continuous branch history. Also restore
mock_check parameters required by @mock.patch decorator injection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Keeping these scripts simpler to start. Detection may want to use the `GITHUB_REPOSITORY` env var, but we also only upload partial artifacts for other repositories right now, so TBD how this will actually be used.
Tests the new function with real API calls:
- Verifies commits are returned as a list
- Validates SHA format (40-char hex)
- Tests max_count parameter limits results

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #2997, "--latest" currently means "latest nightly release". A name like find_latest_ci_artifacts.py would match the defaults used by this script/module more precisely, but I think we can keep this script applying to both CI and CD workflow runs.

This code can be used with nightly releases like so:

  • Set the RELEASE_TYPE env var to e.g. nightly for the retrieve_bucket_info() function (this should be improved somehow)
  • Run with e.g. --workflow release_windows_packages.yml

Like so:

D:\projects\TheRock (artifacts-for-commit)
(.venv) λ set RELEASE_TYPE=nightly

D:\projects\TheRock (artifacts-for-commit)
(.venv) λ python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --workflow release_windows_packages.yml --max-commits 10 --verbose
Searching 10 commits on ROCm/TheRock/main...
  [1/10] Checking c7bc0b40...
    No workflow run found
  [2/10] Checking 6cdf31c5...
    No workflow run found
  [3/10] Checking 901d88d2...
    No workflow run found
  [4/10] Checking ead209c4...
    No workflow run found
  [5/10] Checking 42b464c8...
    No workflow run found
  [6/10] Checking ed34008b...
    No workflow run found
  [7/10] Checking 46a74db4...
    No workflow run found
  [8/10] Checking 4cbac571...
Retrieving bucket info...
  (explicit) github_repository: ROCm/TheRock
  workflow_run_id             : 21346230451
  head_github_repository      : ROCm/TheRock
  is_pr_from_fork             : False
  (implicit) RELEASE_TYPE: nightly
Retrieved bucket info:
  external_repo:
  bucket       : therock-nightly-artifacts
    Found artifacts: run 21346230451
Artifact info:
  Git repository:      ROCm/TheRock
  Git commit:          4cbac571e6d794ef47a2178f5999e9da9368cece
  Git commit URL:      https://github.com/ROCm/TheRock/commit/4cbac571e6d794ef47a2178f5999e9da9368cece
  Platform:            windows
  Artifact group:      gfx110X-all
  Workflow name:       release_windows_packages.yml
  Workflow run ID:     21346230451
  Workflow run URL:    https://github.com/ROCm/TheRock/actions/runs/21346230451
  Workflow run status: completed (success)
  S3 Bucket:           therock-nightly-artifacts
  S3 Path:             21346230451-windows/
  S3 Index:            https://therock-nightly-artifacts.s3.amazonaws.com/21346230451-windows/index-gfx110X-all.html

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that RELEASE_TYPE is an env var instead of an argument to the scripts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight using the RELEASE_TYPE environment variable so deeply in these scripts was a mistake IMO.

The release workflows already set that environment variable so reading it to determine where to upload logs/artifacts felt sensible in #2046. These scripts (and related usage patterns) are running from outside github actions and they're trying to answer the question "for some historical workflow run, where did it upload files?". All information about that should be discoverable via API calls, but if the github logs or bucket files expire then there isn't much data to work with.

Here's an example to demonstrate:

https://github.com/ROCm/TheRock/actions/runs/21274498502

If we don't know that this particular run of release_portable_linux_packages.yml used the nightly release type (it could have been dev), we could either:

  1. Guess, and try each potential bucket
  2. Read the github actions logs, as long as they haven't expired

I think having a common database that connects all the various buckets would make this easier. @marbre might have some ideas too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With what @HereThereBeDragons is currently is been working on we will indeed end up to have common database which stores information about our releases but that might be limited to nightly releases, not sure about dev builds (Laura?). If it is not in that database this would mean those are artifacts produced via pre-/post-submit, which would be the fallback. Not sure we need an additional database to connect this as well?

Copy link
Copy Markdown
Contributor

@SamuelReeder SamuelReeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just have some minor comments.

Comment thread build_tools/find_latest_artifacts.py
Comment on lines +116 to +121
parser.add_argument(
"--repo",
type=str,
default="ROCm/TheRock",
help="Repository in 'owner/repo' format (default: detect from git remote)",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should swap the --run-github-repo arg in install_rocm_from_artifacts.py to this same name for consistency. I find it implicit that it's the github repo associated with the run ID.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We can even provide a list of argument names as aliases for the same option so existing users can continue to use the longer name.

I think this would work:

    parser.add_argument(
+       "--repo",
        "--run-github-repo",
        type=str,
        help="GitHub repository for --run-id in 'owner/repo' format (e.g. 'ROCm/TheRock'). Defaults to GITHUB_REPOSITORY env var or 'ROCm/TheRock'",
    )

(then update uses to use the first name)

Comment on lines +40 to +43
# TODO: wrap `ArtifactBackend` (or `S3Backend`) class here? Or use `BucketMetadata`?
# (we have a few classes tracking similar metadata and reimplementing URL schemes)
@dataclass
class ArtifactRunInfo:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this should be a utility class in a shared file so all the classes can be de-duplicated and use this base.

Comment thread build_tools/find_latest_artifacts.py
Comment thread build_tools/github_actions/github_actions_utils.py Outdated
ScottTodd and others added 4 commits January 26, 2026 15:33
Changes:
- Detect rate limit errors in REST API 403 responses by checking body
- Provide actionable guidance: "Authenticate with `gh auth login`..."
- find_artifacts_for_commit: Raise GitHubAPIError instead of returning None
- find_latest_artifacts: Propagate GitHubAPIError from API calls
- CLI main() functions catch GitHubAPIError and exit with code 2
- Add unit tests for rate limit error handling

This distinguishes between "no artifacts found" (returns None) and
"couldn't check due to error" (raises exception), allowing callers
to handle these cases appropriately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ScottTodd
Copy link
Copy Markdown
Member Author

I plan on waiting to merge this until after #2997 goes in. I'm expecting some merge conflicts in docs/development/installing_artifacts.md. It would also be nice to get a review from someone else on the devops team but we're pretty overloaded (and Geo is out sick)

@ScottTodd
Copy link
Copy Markdown
Member Author

Rebased. Anyone else want to review before I merge?

@ScottTodd ScottTodd merged commit e9da873 into ROCm:main Jan 29, 2026
81 of 84 checks passed
@ScottTodd ScottTodd deleted the artifacts-for-commit branch January 29, 2026 16:44
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jan 29, 2026
Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i was too late.. :(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it also return for all archs?
i could imagine a point where you want to get a list of all available archs for a specific commit as you dont know which archs where run successfully or maybe changed happened like for gfx110X-dcgpu -> gfx110X-all

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, possibly. I think where it will matter the most will be for CI runs where we build multiple artifact groups and we want to pick a baseline commit that contains artifacts for all of those artifact groups (if possible), so each job on a single workflow run uses the same baseline.

def _build_artifact_run_info(
commit: str,
github_repository_name: str,
artifact_group: str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am just half triggered here in those random choices which parameter is at which position in which function. some uniformity would be nice :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ordering came from the public API, which lists arguments in this order (as all arguments with defaults must come after arguments that do not have defaults):

def find_artifacts_for_commit(
    # Required, matches function name
    commit: str,

    # Always required
    artifact_group: str,

    # Has a default
    github_repository_name: str = "ROCm/TheRock",
    # Has a default, depends on prior arg
    workflow_file_name: str = "ci.yml",

    # Has a default, override is rare
    platform: str = platform_module.system().lower(),
) -> ArtifactRunInfo | None:

Ordering in the dataclass does not have the same restrictions, so I set it up as:

commit
github_repository (for that commit)
external_repo (for that repository)

platform
artifact_group (closely related)

workflow_file_name
workflow_run_id
workflow_run_status
workflow_run_conclusion
workflow_run_html_url 

Need the refactoring in #3000 and #3019 to make this significantly better.


if e.code == 403:
# Check if this is a rate limit error
if "rate limit" in error_body.lower():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can we not just raise the error_body?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code before assumed that a 403 error meant that you were using a token and did not have appropriate permissions:

            if e.code == 403:
                raise GitHubAPIError(
                    f"Access denied (403 Forbidden) for {url}. "
                    f"Check if your token has the necessary permissions (e.g., `repo`, `workflow`)."
                ) from e

I think the extra context here is probably useful, and the original error message is still there (from e).

The error handling paths in here were tricky to test though, since I didn't want to deliberately hit a rate limit.

import unittest
from unittest import mock

sys.path.insert(0, os.fspath(Path(__file__).parent.parent))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a short comment what we want to import here? or to which location this path is going?

@ScottTodd
Copy link
Copy Markdown
Member Author

Thanks for the review. I'll follow-up on some of these later. (hopefully nothing too severe that would warrant a revert? let me know)

@HereThereBeDragons
Copy link
Copy Markdown
Contributor

thanks, no no revert needed. mainly cosmetics to be happy long term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants