[CI] Weekly CI and introducing a new amdgpu matrix generator #1732
Draft
HereThereBeDragons wants to merge 27 commits into
Draft
[CI] Weekly CI and introducing a new amdgpu matrix generator #1732HereThereBeDragons wants to merge 27 commits into
HereThereBeDragons wants to merge 27 commits into
Conversation
bd340d6 to
8dd03ac
Compare
This was referenced Nov 21, 2025
8dd03ac to
2c37cf6
Compare
7e9ba03 to
3077ce8
Compare
56cf283 to
91b09d4
Compare
3ea49b4 to
dc95cae
Compare
Contributor
Author
|
@marbre @ScottTodd @geomin12 @jayhawk-commits just for info:
this would run in parallel until all workflows are transferred. only needs to be adjusted for the high level workflows. Successful cmake4 run (rebase on main from last Friday or this Monday?): https://github.com/ROCm/TheRock/actions/runs/21402078653 new layout of the matrix for the workflows:
|
dc95cae to
d9b0814
Compare
87ee5fa to
45237ce
Compare
HereThereBeDragons
added a commit
that referenced
this pull request
Feb 5, 2026
Move all functions related to ci path filtering and determining based on this if the CI should be run or not into a separate file `configure_ci_path_filters.py`. Aside from adjusting description, also gives better names to the following functions: - get_modified_paths -> get_git_modified_paths - get_therock_submodule_paths -> get_git_submodule_paths - should_ci_run_given_modified_paths -> is_ci_run_required Part of CI weekly progress (big picture #1732 ) so that the new and old CI configurators can share those functions.
3f7408f to
0ba3996
Compare
HereThereBeDragons
added a commit
that referenced
this pull request
Feb 12, 2026
This PR is part of enabling CI weekly (big picture PR #1732 ) . For this, refactoring of amdgpu_family is needed for easier selection of a specific gpu, and not having to rely on knowing in which group the specific gpu is part of (presubmit/postsubmit/nightlies). New layout puts all gpus in a single dictionary `amdgpu_family_info_matrix_all`. Choices for pre/postsubmit and nightlies are now done via a list. `amdgpu_family_info_matrix_all` has more depth in hierarchy to better define which parameters belong to which step: build, test, release. - Tests can be now turned off with a single bool flag. no need to comment out the runner - Default runner labels are "test", "benchmark" and "test-runs-on-multi-gpu". Further labels can be introduced, e.g. "oem" which in the future can overwrite the default runners. This new layout is introduced in parallel to the old one. The old one stays unchanged to allow gradual move for the CI to use the new layout.
…aken from the config
…s to run on for test tasks
…meters in high level workflows
…" and some more bug fixes
…le. rename some functions for better naming. adjust tests to reflect changes.
…ci_path_filters.py -> configure_ci_path_filters.py
b48ee61 to
0c1b080
Compare
commit 8f69a63 (mar 12, 26). simplified generator loop, changed flags to include their label, optimized config(), more code style improvements.
1 task
ScottTodd
added a commit
that referenced
this pull request
Mar 31, 2026
This adds new `setup_multi_arch.yml` and `configure_multi_arch_ci_summary.py` configuration code for multi-arch CI. ## Motivation The existing [`build_tools/github_actions/configure_ci.py`](https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/configure_ci.py) script and [`.github/workflows/setup.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/setup.yml) workflow were both tightly coupled to the non-multi-arch CI pipelines and multi-arch CI has some unique needs: * #3399 * Different workflow I/O (single-arch CI has a matrix across each family, multi-arch CI has a single pipeline per platform) * (Related) Setting alternate schedules, see #1732 I also judged that starting fresh would be easier, given architectural issues with the existing code: * Mixed responsibilities in a 300 LOC `matrix_generator()` function * Sequencing of decisions (whether to skip, what to build, what to test, etc.) was scattered and sometimes duplicated ## Technical Details > [!NOTE] > See my worklog for this feature branch here: [`tasks/active/multi-arch-configure.md`](https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/multi-arch-configure.md) The new code is structured as a sequence of stages: 1. Parse inputs from github (triggers, labels) and git (files changed) 2. Check "skip CI" gate to early return 3. Decide which jobs to run and with what options (skip/use prebuilt/build) 4. Decide which GPU targets/families to build (trigger type + labels --> per-platform GPU families) 5. Expand per-platform build configuration data structures 6. Write outputs to github outputs and step summary It includes several ✨NEW✨ features: * Rendered markdown summarizing the configure output Before | After -- | -- https://github.com/ROCm/TheRock/actions/runs/23509071878?pr=4142 <img width="586" height="440" alt="image" src="https://github.com/user-attachments/assets/d03be1d9-c9cc-4075-b06f-2e48eb80f47a" /> | https://github.com/ROCm/TheRock/actions/runs/23465566745?pr=4123 <img width="716" height="978" alt="image" src="https://github.com/user-attachments/assets/83dac546-4e0d-4351-98a5-2170c562317d" /> * A collapsed build graph instead of a matrix with a single build variant Before | After -- | -- https://github.com/ROCm/TheRock/actions/runs/23509071878?pr=4142 <img width="967" height="430" alt="image" src="https://github.com/user-attachments/assets/ace99a26-2e21-4763-aff0-cf8ffea85ab1" /> | https://github.com/ROCm/TheRock/actions/runs/23465566745?pr=4123 <img width="1962" height="619" alt="image" src="https://github.com/user-attachments/assets/b5101768-5390-4adf-87de-c5a99c45799f" /> * Reading inputs via `GITHUB_EVENT_PATH` instead of explicit/verbose `github.event.inputs` plumbing * Passing outputs via `build_config` JSON instead of explicit/verbose `artifact_group`, `matrix_per_family_json`, `dist_amdgpu_families`, etc. (to stay under input count limits and make cross-file maintenance hopefully easier) As well as a few 🪦REMOVED 🪦 features: * No longer using `determine_long_lived_branch()` to change `push` behavior based on the branch name - if we enable the workflow on a branch we should run the same set of jobs * No `run_functional_tests` plumbing - this has not been added to multi-arch CI [yet?] ### Comparison Metrics <details><summary>📊 Feature Comparison</summary> <p> | Feature | Old | New | Change | |---------|-----|-----|--------| | Pipeline architecture | Monolithic `matrix_generator` (295 lines) | 6-step pipeline of pure functions | Redesigned | | Typing | Untyped `base_args` dict | 11 frozen dataclasses, `JobAction` enum | New | | Skip CI gate | Buried in `matrix_generator` + `main()` | `should_skip_ci()` | Simplified | | Target selection | Interleaved with matrix expansion | `select_targets()` | Separated | | Test type | Post-hoc mutation in `main()` | `_determine_test_type()` | Separated | | Matrix output | Array of per-variant rows via `strategy.matrix` | Single `build_config` JSON per platform | Simplified | | Prebuilt stages | Boolean `use_prebuilt_artifacts` | Per-stage `dict[str, JobAction]` on `BuildRocmDecision` | More granular | | Workflow contract | Implicit (YAML ↔ Python drift possible) | Contract tests extract `fromJSON` refs from YAML, assert against dataclass fields | Validated | | Step summary | Limited information | Markdown with skip reasons, per-family test table, non-default callouts | Redesigned | | `setup.yml` coupling | 8+ env vars piped through YAML → Python | Script reads `GITHUB_EVENT_PATH` directly | Decoupled | </p> </details> <details><summary>🔎 Code Complexity</summary> <p> Around the same number of statements but with more comments and structure: | Metric | Old `configure_ci.py` (main) | New `configure_multi_arch_ci.py` + summary | Remaining `configure_ci.py` | |--------|-----|-----|-----| | Lines | 828 | 947 + 204 = 1,151 | 676 | | Statements | 348 | 335 + 114 = 449 | 302 | | Functions | 8 | 30 + 6 = 36 | 7 | | Classes/dataclasses | 0 | 11 | 0 | </p> </details> <details><summary>🧪 Test Coverage</summary> <p> Logic and dataclasses are tested step by step, inputs/outputs are pushed to the boundaries for easier testing. | Metric | Old (main) | New | Delta | |--------|-----------|-----|-------| | Test functions | 33 (configure_ci_test) | 47 (multi_arch) | +14 | | Test lines | 907 | 826 | -81 | | Remaining configure_ci tests | — | 27 | -6 (multi-arch tests removed) | | Statement coverage (main script) | ~63% (configure_ci.py) | 84% (configure_multi_arch_ci.py) | +21pp | | Uncovered code | `main()`, multi-arch paths mixed with single-arch | I/O boundary only (`from_environ`, `write_outputs`, `main`) | cleaner boundary | </p> </details> ## Test Plan * New unit tests * Tests for each stage, using the dataclasses/enums/etc. that are passed between them * A few integration tests for the whole `configure()` pipeline * Tests for the workflow YAML files and their `fromJSON` usage * Manual testing * on `pull_request`, multi-arch CI should run by default * on `push` it should run with the same behavior as before * Watch multi-arch CI behavior for `push` events after merge ## Test Result Expected jobs ran on https://github.com/ROCm/TheRock/actions/runs/23773060560?pr=4123, configure output markdown looks as expected. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This introduces a new weekly CI.
To make this comfortable, larger changes of the amdgpu matrix generator were necessary. The amdgpu matrix generator can now be used to create build, test and/or release target config in a single run for both Windows and Linux. For pull request it keeps the functionality of dynamically deciding based on the modified files if it needs to build the target or not.
In addition, the
new_setup.ymlis used to provide the container image. It can either use the default CI TheRock image or accepts a custom one.