Skip to content

Fork configuration scripts and workflows for multi-arch CI#4123

Merged
ScottTodd merged 59 commits into
ROCm:mainfrom
ScottTodd:multi-arch-configure
Mar 31, 2026
Merged

Fork configuration scripts and workflows for multi-arch CI#4123
ScottTodd merged 59 commits into
ROCm:mainfrom
ScottTodd:multi-arch-configure

Conversation

@ScottTodd
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd commented Mar 23, 2026

This adds new setup_multi_arch.yml and configure_multi_arch_ci_summary.py configuration code for multi-arch CI.

Motivation

The existing build_tools/github_actions/configure_ci.py script and .github/workflows/setup.yml workflow were both tightly coupled to the non-multi-arch CI pipelines and multi-arch CI has some unique needs:

I also judged that starting fresh would be easier, given architectural issues with the existing code:

  • Mixed responsibilities in a 300 LOC matrix_generator() function
  • Sequencing of decisions (whether to skip, what to build, what to test, etc.) was scattered and sometimes duplicated

Technical Details

Note

See my worklog for this feature branch here: tasks/active/multi-arch-configure.md

The new code is structured as a sequence of stages:

  1. Parse inputs from github (triggers, labels) and git (files changed)
  2. Check "skip CI" gate to early return
  3. Decide which jobs to run and with what options (skip/use prebuilt/build)
  4. Decide which GPU targets/families to build (trigger type + labels --> per-platform GPU families)
  5. Expand per-platform build configuration data structures
  6. Write outputs to github outputs and step summary

It includes several ✨NEW✨ features:

As well as a few 🪦REMOVED 🪦 features:

  • No longer using determine_long_lived_branch() to change push behavior based on the branch name - if we enable the workflow on a branch we should run the same set of jobs
  • No run_functional_tests plumbing - this has not been added to multi-arch CI [yet?]

Comparison Metrics

📊 Feature Comparison

Feature Old New Change
Pipeline architecture Monolithic matrix_generator (295 lines) 6-step pipeline of pure functions Redesigned
Typing Untyped base_args dict 11 frozen dataclasses, JobAction enum New
Skip CI gate Buried in matrix_generator + main() should_skip_ci() Simplified
Target selection Interleaved with matrix expansion select_targets() Separated
Test type Post-hoc mutation in main() _determine_test_type() Separated
Matrix output Array of per-variant rows via strategy.matrix Single build_config JSON per platform Simplified
Prebuilt stages Boolean use_prebuilt_artifacts Per-stage dict[str, JobAction] on BuildRocmDecision More granular
Workflow contract Implicit (YAML ↔ Python drift possible) Contract tests extract fromJSON refs from YAML, assert against dataclass fields Validated
Step summary Limited information Markdown with skip reasons, per-family test table, non-default callouts Redesigned
setup.yml coupling 8+ env vars piped through YAML → Python Script reads GITHUB_EVENT_PATH directly Decoupled

🔎 Code Complexity

Around the same number of statements but with more comments and structure:

Metric Old configure_ci.py (main) New configure_multi_arch_ci.py + summary Remaining configure_ci.py
Lines 828 947 + 204 = 1,151 676
Statements 348 335 + 114 = 449 302
Functions 8 30 + 6 = 36 7
Classes/dataclasses 0 11 0

🧪 Test Coverage

Logic and dataclasses are tested step by step, inputs/outputs are pushed to the boundaries for easier testing.

Metric Old (main) New Delta
Test functions 33 (configure_ci_test) 47 (multi_arch) +14
Test lines 907 826 -81
Remaining configure_ci tests 27 -6 (multi-arch tests removed)
Statement coverage (main script) ~63% (configure_ci.py) 84% (configure_multi_arch_ci.py) +21pp
Uncovered code main(), multi-arch paths mixed with single-arch I/O boundary only (from_environ, write_outputs, main) cleaner boundary

Test Plan

  • New unit tests
    • Tests for each stage, using the dataclasses/enums/etc. that are passed between them
    • A few integration tests for the whole configure() pipeline
    • Tests for the workflow YAML files and their fromJSON usage
  • Manual testing
    • on pull_request, multi-arch CI should run by default
    • on push it should run with the same behavior as before
  • Watch multi-arch CI behavior for push events after merge

Test Result

Expected jobs ran on https://github.com/ROCm/TheRock/actions/runs/23773060560?pr=4123, configure output markdown looks as expected.

Submission Checklist

ScottTodd and others added 30 commits March 23, 2026 13:19
New script for multi-arch CI configuration, designed as a pipeline of
data transformations with typed dataclass interfaces between steps:

  1. Parse Inputs (CIInputs from GITHUB_EVENT_PATH)
  2. Check Skip CI (gate: skip-ci label, docs-only changes)
  3. Select Targets (trigger type + labels → GPU families)
  4. Decide Stages (changed files → rebuild/prebuilt per stage)
  5. Expand Matrix (families × variant → matrix entries)
  6. Write Outputs (JSON → GITHUB_OUTPUT)

This commit is the scaffold — dataclasses and pipeline shape with stub
implementations. Each step function has a TODO for Phase 2 logic. Tests
(19 passing) demonstrate the pattern for exercising each step.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The CI pipeline is a DAG of job groups (build-rocm, test-rocm,
build-rocm-python, build-pytorch, test-pytorch), not a flat list of
stages. This restructures the data model to match:

- JobGroupDecision base class (run/prebuilt/skip) with subclasses for
  groups that need extra details (BuildRocmDecision for per-stage
  granularity, TestRocmDecision for test type)
- JobDecisions with explicit named fields for each job group node
- decide_jobs() replaces decide_stages()
- CIOutputs carries JobDecisions instead of flat test_type/stage lists

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Rename internal field to is_ci_enabled (output key stays
  enable_build_jobs for workflow compat)
- Remove from __future__ import annotations per style guide
- Move inline imports to top of file
- Use named arguments for multi-param function calls
- Drop unused targets param from decide_jobs()
- Note that steps 3 and 4 are independent

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Collapse 4 event type property tests into 1 mutual exclusivity check
- Extract _run_from_environ helper for GITHUB_EVENT_PATH test fixtures
  with doc link explaining the testing strategy
- Remove _make_inputs helper — inline CIInputs construction so each
  test's inputs are visible at the call site
- Use named args on expand_matrix test call
- Focus from_environ tests on what's unique per event type (PR labels,
  workflow_dispatch inputs, push before SHA)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Reorder pipeline steps: decide_jobs (3) before select_targets (4) so
target selection and matrix expansion are adjacent.

select_targets policy per trigger type:
- workflow_dispatch: explicit per-platform family lists from inputs
- pull_request: presubmit defaults + gfx*/run-all-archs-ci label opt-ins
- push: presubmit+postsubmit families
- schedule: all families
- Unknown trigger types raise ValueError

Input parsing moved to the boundary: CIInputs.linux_amdgpu_families is
now list[str] (parsed from comma-separated input in from_environ).
Unknown family names fail fast with ValueError.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Expand comments in select_targets to explain the design rationale:
trigger types form a progression from smallest (pull_request) to
broadest (schedule), with workflow_dispatch giving full manual control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…matrix

Port the multi-arch matrix expansion logic from configure_ci.py's
generate_multi_arch_matrix into the new pipeline's expand_matrix function.
Groups families by build variant into a single MatrixEntry with per-family
JSON for downstream per-architecture job expansion.

A parity test confirms identical output to the old function for the same
inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes:
- Replace variant-grouping loop with a flat membership check — there's
  only ever one build_variant per workflow run
- Add expand_matrices() returning a MatrixExpansion dataclass for both
  platforms, with _expand_matrix_for_platform as a private helper
- Rename lookup_matrix → all_families for consistency with select_targets
- Log when a platform has no config for the requested build variant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add amdgpu_family_matrix_test.py with schema validation tests:
no duplicate family names per platform, required fields present,
build_variants non-empty.

This lets _expand_matrix_for_platform drop its defensive checks
(seen_families dedup, .get() defaults for required fields) and
index directly, relying on upstream validation from select_targets
and data invariant tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename MatrixEntry → BuildConfig, expand_matrices → expand_build_configs.
The helper returns BuildConfig | None instead of a 0-or-1-element list.
BuildConfigs holds linux/windows as Optional fields.

write_outputs now emits linux_build_config / windows_build_config as JSON
objects (or empty string when skipped) plus linux_build_enabled /
windows_build_enabled booleans, replacing the JSON array outputs that
existed only to support a matrix strategy we don't need yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test through the public expand_build_configs API, not the private
per-platform helper. Verify structural properties (schema, consistency
between dist_amdgpu_families and matrix_per_family_json, filtering
behavior) rather than specific data values from amdgpu_family_matrix.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check skip-ci PR label first (takes priority), then delegate to
is_ci_run_required() from configure_ci_path_filters.py for
docs-only / no-files-changed detection. schedule and
workflow_dispatch always proceed (changed_files is None).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test_type determination to decide_jobs: quick (default), comprehensive
(schedule), full (submodule changes or test labels), with test_filter: PR
label override. Adopts the test type names from PR ROCm#3992.

Extract GitContext dataclass to separate git-derived data from CIInputs.
configure() is now fully pure — takes CIInputs + GitContext, no git calls.
main() is the only place that builds GitContext.from_repo(). Tests construct
GitContext directly.

Rename parameters to ci_inputs/git_context throughout for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace mutable test_type/reason with early returns in priority order:
test_filter label > test:* labels > schedule > submodule change > default.
Eliminates risk of fall-through overwrites between conditions.

Raise ValueError on unrecognized test_filter values instead of silently
ignoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each pipeline dataclass (CIInputs, GitContext, TargetSelection,
JobDecisions, BuildConfigs) now has a log() method for CI diagnostics.
configure() calls them after each step, replacing ad-hoc prints in
main() and step functions. Remove getattr loops in favor of explicit
field access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add setup_multi_arch.yml: checkout, call configure_multi_arch_ci.py
(reads GITHUB_EVENT_PATH directly, no env var pass-through chain),
compute package version.

Update multi_arch_ci.yml to call setup_multi_arch.yml instead of
setup.yml. Replace matrix strategy with direct fromJSON property
access on per-platform build config objects.

Route prebuilt_stages and baseline_run_id through the job graph model
(BuildRocmDecision.stage_decisions and .baseline_run_id) rather than
as raw pass-through strings. write_outputs derives the output values
from the structured decisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix stale test_type values in module docstring (smoke/full → quick/
  standard/comprehensive/full)
- Replace sys.exit(1) with raise RuntimeError in from_environ()
- Rename lookup_matrix → all_families in _filter_families_by_platform
  for consistency with the rest of the file
- Remove unused sys import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lowercase family names in _parse_comma_list and PR labels in
from_environ(). Matrix keys are lowercase (gfx94x) but workflow_dispatch
descriptions show uppercase (gfx94X) — normalizing at the input boundary
so internal functions don't need to care.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the minimal bullet list with structured markdown: test type
with reason, platform build config table (families, variant, artifact
group), job group decision table, and prebuilt stages when set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rich step summary with DAG visualization, per-family test runner
table, build-pytorch details, and non-default configuration callouts.
Moved to a separate file to keep the main script focused on pipeline
logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…torch

- Use GitHub alert syntax (> [!NOTE]) for non-default callouts
- Mechanical trigger line: "Trigger: `event` on `branch`, `variant` variant."
- "Runner Label" column header in test-rocm table
- Add "# Build graph" comment to DAG
- Remove build-pytorch section (not enough detail yet to be useful)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add === phase headers === to configure() so logs read as a narrative.
Print all CIInputs fields unconditionally (empty values are informative).
Add context to check_skip_ci ("Checking N files against path filters").
Handle empty-families case in summary ("No GPU families selected").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 7+ individual BuildConfig inputs on multi_arch_ci_linux.yml and
multi_arch_ci_windows.yml with a single build_config JSON string.
Downstream workflows unpack with fromJSON(inputs.build_config).field.

Rename matrix_per_family_json (double-encoded JSON string) to
per_family_info (native list). Eliminates the fragile
fromJSON(fromJSON(...)) pattern — matrix strategies now use
fromJSON(inputs.build_config).per_family_info directly.

Fix github_actions_utils → github_actions_api import after rebase.
Remove stale parity test (matrix data diverged on main).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that workflow YAML references to fromJSON(inputs.build_config)
only use fields that exist on the Python BuildConfig dataclass. Linux
checks exact match (all fields used), Windows checks no unknown fields
(build_pytorch not yet used there).

Also move imports to top of test file and update ci:skip/ci:run-all-archs
label names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass prebuilt stage info through expand_build_configs to
_expand_build_config_for_platform, setting fields at construction
time rather than mutating frozen dataclasses after the fact.

Removes 2 separate outputs from setup_multi_arch.yml (9 → 7).
Workflow YAML reads them from build_config via fromJSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ript

Multi-arch CI now uses configure_multi_arch_ci.py exclusively. Remove
generate_multi_arch_matrix, the multi_arch parameter from matrix_generator,
and the ci:run-multi-arch label check from configure_ci.py.

Add ci:run-multi-arch label gate to check_skip_ci in the new script:
pull_request triggers require the label to opt in, avoiding doubled CI
load during the transition. push/schedule/workflow_dispatch are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ScottTodd added a commit that referenced this pull request Mar 26, 2026
## Motivation

I'm adding new code that depends on
[`amdgpu_family_matrix.py`](https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/amdgpu_family_matrix.py)
in #4123 and I want these structural
invariants tested locally so I don't need to assert on them in the new
downstream code.

## Technical Details

We'll migrate over to
https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/new_amdgpu_family_matrix.py
soon, so I'm just adding a minimal set of tests for now.

## Test Plan

* Run the unit tests

## Test Results

Tests can fail as expected:
* `test_required_fields_present`
  ```diff
  λ git diff
diff --git a/build_tools/github_actions/amdgpu_family_matrix.py
b/build_tools/github_actions/amdgpu_family_matrix.py
  --- a/build_tools/github_actions/amdgpu_family_matrix.py
  +++ b/build_tools/github_actions/amdgpu_family_matrix.py
  @@ -174,7 +174,6 @@ amdgpu_family_info_matrix_nightly = {
           "windows": {
               "test-runs-on": "",
               "family": "gfx900",
  -            "fetch-gfx-targets": [],
               "build_variants": ["release"],
               "expect_pytorch_failure": True,
           },
  ```
  ```
(.venv) λ pytest
build_tools/github_actions/tests/amdgpu_family_matrix_test.py -vv
  ...

build_tools\github_actions\tests\amdgpu_family_matrix_test.py::TestFamilyMatrixInvariants::test_required_fields_present
- AssertionError: gfx900/windows missing required fields:
{'fetch-gfx-targets'}
  ```
* `test_no_duplicate_family_names_per_platform`
  ```diff
diff --git a/build_tools/github_actions/amdgpu_family_matrix.py
b/build_tools/github_actions/amdgpu_family_matrix.py
  index 63ad8c8..6eae4125 100644
  --- a/build_tools/github_actions/amdgpu_family_matrix.py
  +++ b/build_tools/github_actions/amdgpu_family_matrix.py
  @@ -146,6 +146,18 @@ amdgpu_family_info_matrix_presubmit = {
               "build_variants": ["release"],
           },
       },
  +    "gfx120x-all": {
  +        "linux": {
  +            # TODO(#2683): Re-enable label once stable
  +            # Label is linux-gfx120X-gpu-rocm
  +            "test-runs-on": "",
  +            "family": "gfx120X-all",
  +            "fetch-gfx-targets": ["gfx1200", "gfx1201"],
  +            "bypass_tests_for_releases": True,
  +            "build_variants": ["release"],
  +            "sanity_check_only_for_family": True,
  +        },
  +    },
   }
  ```
  ```
(.venv) λ pytest
build_tools/github_actions/tests/amdgpu_family_matrix_test.py -vv
  ...

build_tools\github_actions\tests\amdgpu_family_matrix_test.py::TestFamilyMatrixInvariants::test_no_duplicate_family_names_per_platform
- AssertionError: Duplicate family 'gfx120X-all' on linux: target
'gfx120x-all' and 'gfx120x'
  ```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far what i saw.

reviewed everything but:

build_tools/github_actions/configure_multi_arch_ci.py
build_tools/github_actions/tests/configure_multi_arch_ci_test.py

will do those tomorrow

Comment thread .github/workflows/multi_arch_ci.yml Outdated
Comment thread .github/workflows/setup_multi_arch.yml
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py Outdated
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py
return "\n".join(lines)


def _non_default_callouts(ci_inputs: CIInputs, outputs: CIOutputs) -> list[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i never heard callout in that context. maybe a few comments could help to set the expectation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "highlights" make more sense than "callouts"? These are any parts of the CI configuration that are different from what runs by "default", such as when a label was added to opt-in to some extra builds or tests. I figured that if we only showed tables with the computed values, it wouldn't be obvious what about those values is worth paying attention to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping pong with claude suggests:
_config_overrides
_override_notes
_active_overrides

Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py Outdated
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py
Comment thread build_tools/github_actions/configure_multi_arch_ci_summary.py
chiranjeevipattigidi pushed a commit that referenced this pull request Mar 27, 2026
## Motivation

Progress on #3399. I'm extending
the configuration scripts/workflows for multi-arch CI in
#4123 and the docs could use a
refresh.

## Technical Details

* Highlight the differences between what jobs run on different triggers
(I also have changes on #4123 that
will surface this in the step summary output)
* Elaborate on "CI" vs "Multi-Arch CI" setup and roadmap

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude <noreply@anthropic.com>
@ScottTodd
Copy link
Copy Markdown
Member Author

Thanks for the feedback. We've been landing substantial changes to the old/current configure_ci.py and workflows as part of switching our default workflow from ci.yml to multi_arch_ci.yml. I'll sync this, resolve merge conflicts, then start addressing the comments next week.

Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewing: build_tools/github_actions/configure_multi_arch_ci.py
i would recommend splitting this in multiple files. you already have 5 distinct steps + the data classes.

still missing: test file

see also comment about running those workflows for release branches.

Comment thread build_tools/github_actions/configure_multi_arch_ci.py
Comment thread build_tools/github_actions/configure_multi_arch_ci.py Outdated
Comment thread build_tools/github_actions/configure_multi_arch_ci.py Outdated
Comment thread build_tools/github_actions/configure_multi_arch_ci.py
Comment thread build_tools/github_actions/configure_multi_arch_ci.py Outdated
Comment thread .github/workflows/multi_arch_ci.yml
Comment thread build_tools/github_actions/configure_multi_arch_ci.py
Comment thread build_tools/github_actions/configure_multi_arch_ci.py
Comment thread build_tools/github_actions/configure_multi_arch_ci.py
Comment thread build_tools/github_actions/configure_multi_arch_ci.py
ScottTodd and others added 11 commits March 30, 2026 13:07
Resolved conflicts and moved code for
* log links
* skip labels
Pass CIInputs to expand_build_configs instead of just build_variant,
giving it access to PR labels. When a test_runner:<kernel> label is
present (e.g. test_runner:oem), override the test runner for families
that have a matching test-runs-on-kernel entry in the family matrix.
Families without kernel support get their test runner cleared.

Also adds test_runner: to the step summary highlights section.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ScottTodd
Copy link
Copy Markdown
Member Author

i would recommend splitting this in multiple files. you already have 5 distinct steps + the data classes.

I think splitting files will make more sense once we start filling in new logic for the build and test jobs. As is, this is just porting the existing configure_ci.py logic to a new framework.

@ScottTodd
Copy link
Copy Markdown
Member Author

Addressed all comments. PTAL?

chiranjeevipattigidi pushed a commit that referenced this pull request Mar 31, 2026
## Motivation

I'm adding new code that depends on
[`amdgpu_family_matrix.py`](https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/amdgpu_family_matrix.py)
in #4123 and I want these structural
invariants tested locally so I don't need to assert on them in the new
downstream code.

## Technical Details

We'll migrate over to
https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/new_amdgpu_family_matrix.py
soon, so I'm just adding a minimal set of tests for now.

## Test Plan

* Run the unit tests

## Test Results

Tests can fail as expected:
* `test_required_fields_present`
  ```diff
  λ git diff
diff --git a/build_tools/github_actions/amdgpu_family_matrix.py
b/build_tools/github_actions/amdgpu_family_matrix.py
  --- a/build_tools/github_actions/amdgpu_family_matrix.py
  +++ b/build_tools/github_actions/amdgpu_family_matrix.py
  @@ -174,7 +174,6 @@ amdgpu_family_info_matrix_nightly = {
           "windows": {
               "test-runs-on": "",
               "family": "gfx900",
  -            "fetch-gfx-targets": [],
               "build_variants": ["release"],
               "expect_pytorch_failure": True,
           },
  ```
  ```
(.venv) λ pytest
build_tools/github_actions/tests/amdgpu_family_matrix_test.py -vv
  ...

build_tools\github_actions\tests\amdgpu_family_matrix_test.py::TestFamilyMatrixInvariants::test_required_fields_present
- AssertionError: gfx900/windows missing required fields:
{'fetch-gfx-targets'}
  ```
* `test_no_duplicate_family_names_per_platform`
  ```diff
diff --git a/build_tools/github_actions/amdgpu_family_matrix.py
b/build_tools/github_actions/amdgpu_family_matrix.py
  index 63ad8c8..6eae4125 100644
  --- a/build_tools/github_actions/amdgpu_family_matrix.py
  +++ b/build_tools/github_actions/amdgpu_family_matrix.py
  @@ -146,6 +146,18 @@ amdgpu_family_info_matrix_presubmit = {
               "build_variants": ["release"],
           },
       },
  +    "gfx120x-all": {
  +        "linux": {
  +            # TODO(#2683): Re-enable label once stable
  +            # Label is linux-gfx120X-gpu-rocm
  +            "test-runs-on": "",
  +            "family": "gfx120X-all",
  +            "fetch-gfx-targets": ["gfx1200", "gfx1201"],
  +            "bypass_tests_for_releases": True,
  +            "build_variants": ["release"],
  +            "sanity_check_only_for_family": True,
  +        },
  +    },
   }
  ```
  ```
(.venv) λ pytest
build_tools/github_actions/tests/amdgpu_family_matrix_test.py -vv
  ...

build_tools\github_actions\tests\amdgpu_family_matrix_test.py::TestFamilyMatrixInvariants::test_no_duplicate_family_names_per_platform
- AssertionError: Duplicate family 'gfx120X-all' on linux: target
'gfx120x-all' and 'gfx120x'
  ```

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Copy link
Copy Markdown
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks you. looks good.

only minor ideas for improvement, directly answered in the comments/your reply.

@ScottTodd ScottTodd merged commit 947d882 into ROCm:main Mar 31, 2026
14 of 15 checks passed
@ScottTodd ScottTodd deleted the multi-arch-configure branch March 31, 2026 17:24
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Mar 31, 2026
@ScottTodd
Copy link
Copy Markdown
Member Author

Test Plan

  • Watch multi-arch CI behavior for push events after merge

Looks good: https://github.com/ROCm/TheRock/actions/runs/23810551251.

  • Same set of targets, test types, etc. as on the pull request,
  • Trigger detected as "push"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:run-multi-arch Opt-in to running multi-arch CI on a pull request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants