[ci] Adjust expect_failure and expect_pytorch_failure logic#4500
Draft
ScottTodd wants to merge 5 commits into
Draft
[ci] Adjust expect_failure and expect_pytorch_failure logic#4500ScottTodd wants to merge 5 commits into
ScottTodd wants to merge 5 commits into
Conversation
This was referenced Apr 13, 2026
ScottTodd
added a commit
that referenced
this pull request
Apr 14, 2026
## Motivation These inputs are unused and do not belong in build jobs. Removing them will help with: * #3336 * #3334 ## Technical Details These other inputs are also candidates for cleanup: input | notes -- | -- `expect_failure` | See #4500 `artifact_group` | Previously used here, may need to line up with `build_variant_suffix`: https://github.com/ROCm/TheRock/blob/15558f4240876c7b4eb667f20182db4e3673e4e6/.github/workflows/build_portable_linux_artifacts.yml#L172-L177 `build_variant_label` | Previously used here, may be useful later: https://github.com/ROCm/TheRock/blob/15558f4240876c7b4eb667f20182db4e3673e4e6/.github/workflows/build_portable_linux_artifacts.yml#L66 (but see also #4415) `build_variant_suffix` | Partially handled: https://github.com/ROCm/TheRock/blob/15558f4240876c7b4eb667f20182db4e3673e4e6/build_tools/github_actions/configure_multi_arch_ci.py#L862-L869 https://github.com/ROCm/TheRock/blob/15558f4240876c7b4eb667f20182db4e3673e4e6/.github/workflows/multi_arch_ci_linux.yml#L120-L126 ## Test Plan * CI run should include expected build/test jobs ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Member
Author
Uh, it succeeded: https://github.com/ROCm/TheRock/actions/runs/24469430596/job/71558513727?pr=4500 . Maybe #1927 is fixed now? TheRock/build_tools/github_actions/amdgpu_family_matrix.py Lines 197 to 204 in 1db2c09 This PR also has merge conflicts now. I'll probably need to split it into smaller PRs. |
1 task
ScottTodd
added a commit
that referenced
this pull request
Apr 29, 2026
## Motivation This dev release in rockrel: https://github.com/ROCm/rockrel/actions/runs/25100364100/attempts/1 hit errors `Multi-Arch Release Error when evaluating 'strategy' for job 'test_components'. 30fc752 (Line: 176, Col: 21): Error from function 'fromJSON': empty input, 30fc752 (Line: 176, Col: 21): Unexpected value ''` ## Technical Details I think we missed this in CI workflows since multi_arch_ci_linux.yml has a `expect_failure` condition: https://github.com/ROCm/TheRock/blob/88425ee26eb1259292089c723432e4594e3bbb20/.github/workflows/multi_arch_ci_linux.yml#L99-L103 I did not add that condition to multi_arch_release_linux.yml: https://github.com/ROCm/TheRock/blob/7161bc7968a7bae56be9ea4658b6261831d14d8e/.github/workflows/multi_arch_release_linux.yml#L117-L120 I'd like to remove that `expect_failure` entirely since it isn't actually working right now in multi-arch CI, see #4500 ## Test Plan We'll need to run a release workflow until it reaches the test step, which takes hours. Might as well merge and test via an actual dev release. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Handling a few related issues that will complicate multi-arch release pipeline work (#3334):
1.
expect_failureplumbingexpect_failurewas optionally set on a build variant (e.g. release/asan/tsan). As of 3c528c0, it is never set now.We were passing
expect_failureall the way through tomulti_arch_build_portable_linux.ymlwhere stages are built, but it was ignored there. That workflow doesn't have a single job that it could mark ascontinue-on-erroras the job that it replaces:TheRock/.github/workflows/build_portable_linux_artifacts.yml
Lines 61 to 67 in 740c338
Solution: stop handling
expect_failurealtogether. We can add it back later as needed (somehow).2.
expect_pytorch_failureplumbingexpect_pytorch_failurewas being read from the build variant when it actually exists on a GPU family:TheRock/build_tools/github_actions/amdgpu_family_matrix.py
Lines 178 to 184 in 96c9642
TheRock/build_tools/github_actions/configure_multi_arch_ci.py
Line 861 in 96c9642
In multi-arch CI we currently build for all GPU families in a single workflow, with a matrix in
build_pytorch_wheels_per_family. We could do two layers of filtering here: one for "should this workflow run build pytorch at all?" and another for "does the pytorch build for this GPU family work?"Solution: correct the configuration plumbing, add TODO to handle in the matrix somehow
Technical Details
I included a few other loosely related / incidental changes here (could move to a separate PR on request):
expand_build_configsTestDualLabelRunnerSelectiontests intoTestExpandBuildConfigswhere they belong (and can reuse helper functions)Test Plan
Submission Checklist