Add find_artifacts_for_commit.py and find_latest_artifacts.py utils#3093
Conversation
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Tests use real GitHub API calls with pinned historical commits, mocking only the S3 artifact existence check (check_if_artifacts_exist) to avoid brittleness from S3 retention policies. find_latest_artifacts tests also mock get_recent_branch_commits_via_api to control which commits are searched. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Switch from a main + fork commit pair to two consecutive commits on TheRock main, which better represents the real environment where find_latest_artifacts walks continuous branch history. Also restore mock_check parameters required by @mock.patch decorator injection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Keeping these scripts simpler to start. Detection may want to use the `GITHUB_REPOSITORY` env var, but we also only upload partial artifacts for other repositories right now, so TBD how this will actually be used.
Tests the new function with real API calls: - Verifies commits are returned as a list - Validates SHA format (40-char hex) - Tests max_count parameter limits results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
In #2997, "--latest" currently means "latest nightly release". A name like find_latest_ci_artifacts.py would match the defaults used by this script/module more precisely, but I think we can keep this script applying to both CI and CD workflow runs.
This code can be used with nightly releases like so:
- Set the
RELEASE_TYPEenv var to e.g.nightlyfor theretrieve_bucket_info()function (this should be improved somehow) - Run with e.g.
--workflow release_windows_packages.yml
Like so:
D:\projects\TheRock (artifacts-for-commit)
(.venv) λ set RELEASE_TYPE=nightly
D:\projects\TheRock (artifacts-for-commit)
(.venv) λ python build_tools/find_latest_artifacts.py --artifact-group gfx110X-all --workflow release_windows_packages.yml --max-commits 10 --verbose
Searching 10 commits on ROCm/TheRock/main...
[1/10] Checking c7bc0b40...
No workflow run found
[2/10] Checking 6cdf31c5...
No workflow run found
[3/10] Checking 901d88d2...
No workflow run found
[4/10] Checking ead209c4...
No workflow run found
[5/10] Checking 42b464c8...
No workflow run found
[6/10] Checking ed34008b...
No workflow run found
[7/10] Checking 46a74db4...
No workflow run found
[8/10] Checking 4cbac571...
Retrieving bucket info...
(explicit) github_repository: ROCm/TheRock
workflow_run_id : 21346230451
head_github_repository : ROCm/TheRock
is_pr_from_fork : False
(implicit) RELEASE_TYPE: nightly
Retrieved bucket info:
external_repo:
bucket : therock-nightly-artifacts
Found artifacts: run 21346230451
Artifact info:
Git repository: ROCm/TheRock
Git commit: 4cbac571e6d794ef47a2178f5999e9da9368cece
Git commit URL: https://github.com/ROCm/TheRock/commit/4cbac571e6d794ef47a2178f5999e9da9368cece
Platform: windows
Artifact group: gfx110X-all
Workflow name: release_windows_packages.yml
Workflow run ID: 21346230451
Workflow run URL: https://github.com/ROCm/TheRock/actions/runs/21346230451
Workflow run status: completed (success)
S3 Bucket: therock-nightly-artifacts
S3 Path: 21346230451-windows/
S3 Index: https://therock-nightly-artifacts.s3.amazonaws.com/21346230451-windows/index-gfx110X-all.html
There was a problem hiding this comment.
Is there a reason that RELEASE_TYPE is an env var instead of an argument to the scripts?
There was a problem hiding this comment.
In hindsight using the RELEASE_TYPE environment variable so deeply in these scripts was a mistake IMO.
The release workflows already set that environment variable so reading it to determine where to upload logs/artifacts felt sensible in #2046. These scripts (and related usage patterns) are running from outside github actions and they're trying to answer the question "for some historical workflow run, where did it upload files?". All information about that should be discoverable via API calls, but if the github logs or bucket files expire then there isn't much data to work with.
Here's an example to demonstrate:
https://github.com/ROCm/TheRock/actions/runs/21274498502
- Commit: 154d5ab
- Workflow:
release_portable_linux_packages.yml - release_type:
nightly - bucket (computed):
therock-nightly-artifacts - example artifacts index: https://therock-nightly-artifacts.s3.amazonaws.com/21274498502-linux/index-gfx94X-dcgpu.html
If we don't know that this particular run of release_portable_linux_packages.yml used the nightly release type (it could have been dev), we could either:
- Guess, and try each potential bucket
- Read the github actions logs, as long as they haven't expired
I think having a common database that connects all the various buckets would make this easier. @marbre might have some ideas too.
There was a problem hiding this comment.
With what @HereThereBeDragons is currently is been working on we will indeed end up to have common database which stores information about our releases but that might be limited to nightly releases, not sure about dev builds (Laura?). If it is not in that database this would mean those are artifacts produced via pre-/post-submit, which would be the fallback. Not sure we need an additional database to connect this as well?
SamuelReeder
left a comment
There was a problem hiding this comment.
Looks good to me! Just have some minor comments.
| parser.add_argument( | ||
| "--repo", | ||
| type=str, | ||
| default="ROCm/TheRock", | ||
| help="Repository in 'owner/repo' format (default: detect from git remote)", | ||
| ) |
There was a problem hiding this comment.
I wonder if we should swap the --run-github-repo arg in install_rocm_from_artifacts.py to this same name for consistency. I find it implicit that it's the github repo associated with the run ID.
There was a problem hiding this comment.
Sure. We can even provide a list of argument names as aliases for the same option so existing users can continue to use the longer name.
I think this would work:
parser.add_argument(
+ "--repo",
"--run-github-repo",
type=str,
help="GitHub repository for --run-id in 'owner/repo' format (e.g. 'ROCm/TheRock'). Defaults to GITHUB_REPOSITORY env var or 'ROCm/TheRock'",
)(then update uses to use the first name)
| # TODO: wrap `ArtifactBackend` (or `S3Backend`) class here? Or use `BucketMetadata`? | ||
| # (we have a few classes tracking similar metadata and reimplementing URL schemes) | ||
| @dataclass | ||
| class ArtifactRunInfo: |
There was a problem hiding this comment.
Agree that this should be a utility class in a shared file so all the classes can be de-duplicated and use this base.
Changes: - Detect rate limit errors in REST API 403 responses by checking body - Provide actionable guidance: "Authenticate with `gh auth login`..." - find_artifacts_for_commit: Raise GitHubAPIError instead of returning None - find_latest_artifacts: Propagate GitHubAPIError from API calls - CLI main() functions catch GitHubAPIError and exit with code 2 - Add unit tests for rate limit error handling This distinguishes between "no artifacts found" (returns None) and "couldn't check due to error" (raises exception), allowing callers to handle these cases appropriately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
I plan on waiting to merge this until after #2997 goes in. I'm expecting some merge conflicts in |
|
Rebased. Anyone else want to review before I merge? |
HereThereBeDragons
left a comment
There was a problem hiding this comment.
i guess i was too late.. :(
There was a problem hiding this comment.
could it also return for all archs?
i could imagine a point where you want to get a list of all available archs for a specific commit as you dont know which archs where run successfully or maybe changed happened like for gfx110X-dcgpu -> gfx110X-all
There was a problem hiding this comment.
Sort of, possibly. I think where it will matter the most will be for CI runs where we build multiple artifact groups and we want to pick a baseline commit that contains artifacts for all of those artifact groups (if possible), so each job on a single workflow run uses the same baseline.
| def _build_artifact_run_info( | ||
| commit: str, | ||
| github_repository_name: str, | ||
| artifact_group: str, |
There was a problem hiding this comment.
i am just half triggered here in those random choices which parameter is at which position in which function. some uniformity would be nice :)
There was a problem hiding this comment.
Some ordering came from the public API, which lists arguments in this order (as all arguments with defaults must come after arguments that do not have defaults):
def find_artifacts_for_commit(
# Required, matches function name
commit: str,
# Always required
artifact_group: str,
# Has a default
github_repository_name: str = "ROCm/TheRock",
# Has a default, depends on prior arg
workflow_file_name: str = "ci.yml",
# Has a default, override is rare
platform: str = platform_module.system().lower(),
) -> ArtifactRunInfo | None:Ordering in the dataclass does not have the same restrictions, so I set it up as:
commit
github_repository (for that commit)
external_repo (for that repository)
platform
artifact_group (closely related)
workflow_file_name
workflow_run_id
workflow_run_status
workflow_run_conclusion
workflow_run_html_url
Need the refactoring in #3000 and #3019 to make this significantly better.
|
|
||
| if e.code == 403: | ||
| # Check if this is a rate limit error | ||
| if "rate limit" in error_body.lower(): |
There was a problem hiding this comment.
why can we not just raise the error_body?
There was a problem hiding this comment.
The code before assumed that a 403 error meant that you were using a token and did not have appropriate permissions:
if e.code == 403:
raise GitHubAPIError(
f"Access denied (403 Forbidden) for {url}. "
f"Check if your token has the necessary permissions (e.g., `repo`, `workflow`)."
) from eI think the extra context here is probably useful, and the original error message is still there (from e).
The error handling paths in here were tricky to test though, since I didn't want to deliberately hit a rate limit.
| import unittest | ||
| from unittest import mock | ||
|
|
||
| sys.path.insert(0, os.fspath(Path(__file__).parent.parent)) |
There was a problem hiding this comment.
maybe add a short comment what we want to import here? or to which location this path is going?
|
Thanks for the review. I'll follow-up on some of these later. (hopefully nothing too severe that would warrant a revert? let me know) |
|
thanks, no no revert needed. mainly cosmetics to be happy long term |
Motivation
Progress on #2608.
These scripts have utility on their own, but I'm intending for them to be building blocks on top of which we can:
find_baseline_artifacts_for_commit.pythat would include more logic like finding the base ref for a branch, or finding artifacts sharing fingerprints with the current source/build treeTechnical Details
ArtifactRunInfodataclass inbuild_tools/find_artifacts_for_commit.pyduplicates a substantial amount of logic. I'm working on a deeper refactoring there (see a draft at Add centralRunOutputRootclass for deriving CI run output paths on S3 #3000).install_rocm_from_artifacts.py#2997Test Plan
Test Result
Note
The logging is verbose right now. I want to move the
Retrieving bucket infoblocks behind a-vflag, which may involve switching more of our python files to using theloggingmodule.Submission Checklist