Use triton wheel no fork by mengfei-jiang · Pull Request #2959 · ROCm/aiter

mengfei-jiang · 2026-04-29T09:40:52Z

Motivation

Currently triton is either built from source in CI (build-triton job) or relies on whatever version is pre-installed in the Docker base image. This is slow, fragile, and inconsistent across workflows. AMD now publishes pre-built amd-triton wheels on ROCm 7.0, ROCm 7.1, and ROCm 7.2, making source builds unnecessary. This PR centralizes triton installation into a single shared script that auto-detects the ROCm version and installs the matching amd-triton wheel, ensuring all CI workflows and local development use the same triton distribution.

Additionally, there were no tests verifying that triton operators work correctly with torch.compile. This PR adds 10 torch.compile compatibility tests to catch regressions early.

Technical Details

Replace triton source builds with amd-triton from AMD PyPI: Introduce a shared install_triton.sh script that auto-detects the ROCm version via rocm-core and installs the matching amd-triton wheel from https://pypi.amd.com/triton/rocm-{major}.{minor}.0/simple/. This eliminates the need to build triton from source in CI, removing the build-triton job from triton-test.yaml (~80 lines).
Unify triton installation across all CI workflows: Add install_triton.sh to aiter-test.yaml, atom-test.yaml, sglang_downstream.yaml, and vllm_benchmark.yaml, ensuring all workflows use the same amd-triton version. The script uninstalls all conflicting triton variants (triton, pytorch-triton, pytorch-triton-rocm, triton-rocm, amd-triton) before installing.
Auto-install in develop mode: setup.py now calls install_triton.sh during python setup.py develop, so developers get amd-triton installed automatically.
Add torch.compile compatibility tests: 10 new test files under op_tests/triton_tests/torch_compile/ verifying that triton operators work correctly with torch.compile(backend="inductor", fullgraph=True). Covers activation, fused_mul_add, gemm, moe_routing, quantization (per-tensor/per-token), rmsnorm, rope, softmax, and topk.
Deduplicate _get_compiled helper: Extract the shared _get_compiled function into torch_compile/init.py so all test files import from a single location.
Add torch_compile test times to split_tests.sh: Include FILE_TIMES for the 10 new tests (~90s total) to enable proper shard balancing in CI.
Update README: Add Triton section documenting amd-triton installation with AMD PyPI index URLs and the install_triton.sh script.

Test Plan

triton-test.yaml passes without the build-triton job (amd-triton installed via install_triton.sh)
aiter-test.yaml standard and multi-gpu tests pass with Install amd-triton step
atom-test.yaml, sglang_downstream.yaml, vllm_benchmark.yaml CI workflows pass
All 10 torch_compile tests on MI300X and MI35X runners

Test Result

All can pass

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-04-29T09:41:06Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2959 --add-label <label>

Update build-triton job to first attempt downloading a pre-built wheel from rocm.frameworks-nightlies.amd.com, falling back to source build only when the download fails. Also bump TRITON_COMMIT to d1660454. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Update wheel URL format from triton-3.7.0+amd.git<commit> to triton-3.7.0+rocm7.2.0.git<commit> to match the actual naming convention on the nightly server. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

The server returns 403 when '+' is used literally in the URL. Percent-encode it as %2B while keeping the local filename with '+'. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

The wheel files are under gfx942-gfx950/ not gfx942-gfx950/triton/. The triton/ subdirectory is a PEP 503 index page whose links point to ../ (the parent directory). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- Add requirements-triton.txt with --extra-index-url for AMD PyPI - Add pip install -r requirements-triton.txt in build_aiter_triton.sh - Remove build-triton job from triton-test.yaml, use BUILD_TRITON=0 - Update README.md with Triton installation instructions Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Relax atol/rtol to 0.1 for bfloat16 due to lower precision (7-bit mantissa). Add fullgraph=True to enforce full graph compilation without eager fallback. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- Change dtype params to [float16, bfloat16] across all torch_compile tests - Add torch._dynamo.reset() to prevent recompile limit with fullgraph=True - Relax tolerance for bf16 in fused_mul_add and activation tests (atol=0.1) Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Move triton dependency from a separate requirements-triton.txt (using AMD PyPI index) to the standard amd-triton package on PyPI, added as both a build and runtime dependency. This simplifies installation by making `pip install -e .` handle triton automatically. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

amd-triton is now available on PyPI directly, so the extra index URL for AMD PyPI is no longer needed. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

If amd-triton is not yet installed, pip uninstall returns non-zero which would abort setup.py. The reinstall call is kept as check_call to ensure amd-triton is always installed with the latest content. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Use requirements-triton.txt for triton installation instead of embedding it in pyproject.toml/setup.py. The file now references amd-triton from PyPI directly, no extra index URL needed. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Replace requirements-triton.txt with inline ROCm version detection in setup.py and CI script. Uninstall all conflicting triton packages (triton, pytorch-triton, pytorch-triton-rocm, triton-rocm, amd-triton) before installing amd-triton with the correct --extra-index-url based on the detected ROCm version. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Add ROCm version detection and amd-triton installation to atom-test, vllm_benchmark, and sglang_downstream workflows before pip install -e . Wrap amd-triton install in setup.py with try/except to avoid build failure in PEP 517 isolated environments where pip is unavailable. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Consolidate duplicated ROCm version detection and amd-triton installation logic into .github/scripts/install_triton.sh. Update all CI workflows (build_aiter_triton, atom-test, vllm_benchmark, sglang_downstream) and README to call the shared script. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Replace inline ROCm version detection and amd-triton install code in setup.py with a call to the shared install_triton.sh script. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…nstall amd-triton in aiter-test CI - Move _get_compiled into torch_compile/__init__.py so all test files import from a single location - Add FILE_TIMES for the 10 torch_compile tests to split_tests.sh - Add Install amd-triton step in aiter-test.yaml for standard and multi-gpu test jobs Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

brunomazzottiamd · 2026-04-29T18:45:13Z

I've checked the new test files, LGTM! However, I don't have enough CI knowledge to comment on the scripts and workflows. Let's wait for an approaval from AITER CI team.

brunomazzottiamd · 2026-04-29T18:48:07Z

Failures in Flash Attention Integration jobs are being addressed in #2695.

brunomazzottiamd · 2026-04-29T18:49:43Z

Added ci:triton-300x label to trigger execution of Triton unit tests in gfx942 nodes.

Dewei-Wang-sh

overall, lgtm

Dewei-Wang-sh · 2026-04-30T01:29:21Z

Motivation

Replace the triton dependency management with auto-detection of the ROCm version to install the correct amd-triton

however, this description is outdated and only show one point, please change accordingly.
you need to clarify what this pr does and what this pr for..

…2985) PR #2959 introduced .github/scripts/install_triton.sh and added an "Install amd-triton" step to aiter-test.yaml that calls the script inside the docker container. The container's working directory is the PR's checkout, so any PR opened or last synced before #2959 landed on main does not contain the script and fails with: bash: line 1: ./.github/scripts/install_triton.sh: No such file ##[error]Process completed with exit code 127. This blocks Standard Tests on every stale PR (e.g. #2969, all 9/10 shards failing), forcing authors to rebase just to get green CI. Fix: in the Install amd-triton step, fall back to fetching the script from the base ref via raw.githubusercontent.com when it is not present in the runner workspace. Workflow files for PR events always come from the base branch, so this stays consistent with the rest of the CI flow and adds no security boundary crossing. Applied symmetrically to the Standard Tests (1 GPU) and Multi-GPU Tests (8 GPU) jobs. atom-test.yaml and sglang_downstream.yaml also call the script after a fresh git clone of the PR sha and would benefit from a similar fallback in a follow-up.

Currently triton is either built from source in CI (build-triton job) or relies on whatever version is pre-installed in the Docker base image. This is slow, fragile, and inconsistent across workflows. AMD now publishes pre-built amd-triton wheels on ROCm 7.0, ROCm 7.1, and ROCm 7.2, making source builds unnecessary. This PR centralizes triton installation into a single shared script that auto-detects the ROCm version and installs the matching amd-triton wheel, ensuring all CI workflows and local development use the same triton distribution. --------- Co-authored-by: Claude Opus 4 <noreply@anthropic.com>

…2985) PR #2959 introduced .github/scripts/install_triton.sh and added an "Install amd-triton" step to aiter-test.yaml that calls the script inside the docker container. The container's working directory is the PR's checkout, so any PR opened or last synced before #2959 landed on main does not contain the script and fails with: bash: line 1: ./.github/scripts/install_triton.sh: No such file ##[error]Process completed with exit code 127. This blocks Standard Tests on every stale PR (e.g. #2969, all 9/10 shards failing), forcing authors to rebase just to get green CI. Fix: in the Install amd-triton step, fall back to fetching the script from the base ref via raw.githubusercontent.com when it is not present in the runner workspace. Workflow files for PR events always come from the base branch, so this stays consistent with the rest of the CI flow and adds no security boundary crossing. Applied symmetrically to the Standard Tests (1 GPU) and Multi-GPU Tests (8 GPU) jobs. atom-test.yaml and sglang_downstream.yaml also call the script after a fresh git clone of the PR sha and would benefit from a similar fallback in a follow-up.

mengfei-jiang requested a review from a team April 29, 2026 09:40

mengfei-jiang added the ci:all label Apr 29, 2026

brunomazzottiamd requested a review from gyohuangxin April 29, 2026 13:02

This comment was marked as resolved.

Sign in to view

brunomazzottiamd requested review from azaidy and vgokhale April 29, 2026 13:10

mengfei-jiang and others added 19 commits April 29, 2026 15:50

CI: add rocm version to pre-built Triton wheel filename

8c7d89e

Update wheel URL format from triton-3.7.0+amd.git<commit> to triton-3.7.0+rocm7.2.0.git<commit> to match the actual naming convention on the nightly server. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

CI: URL-encode '+' as '%2B' in Triton wheel download URL

c842dfe

The server returns 403 when '+' is used literally in the URL. Percent-encode it as %2B while keeping the local filename with '+'. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

CI: fix Triton wheel download URL path

81cf12f

The wheel files are under gfx942-gfx950/ not gfx942-gfx950/triton/. The triton/ subdirectory is a PEP 503 index page whose links point to ../ (the parent directory). Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

add torch compile test for triton

24c29e3

Fix black formatting in torch_compile tests

a17bc2b

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Fix fused_mul_add bf16 tolerance and add fullgraph=True

33af62c

Relax atol/rtol to 0.1 for bfloat16 due to lower precision (7-bit mantissa). Add fullgraph=True to enforce full graph compilation without eager fallback. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Fix black formatting for fullgraph=True line wrap

439e166

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Remove PIP_EXTRA_INDEX_URL for AMD PyPI in CI script

fbd23e6

amd-triton is now available on PyPI directly, so the extra index URL for AMD PyPI is no longer needed. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Use install_triton.sh in setup.py instead of inline logic

2301ed0

Replace inline ROCm version detection and amd-triton install code in setup.py with a call to the shared install_triton.sh script. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

mengfei-jiang force-pushed the use_triton_wheel_no_fork branch from 7d3a4db to 6bdadaa Compare April 29, 2026 15:57

This comment was marked as resolved.

Sign in to view

brunomazzottiamd added the ci:triton-300x label Apr 29, 2026

Dewei-Wang-sh approved these changes Apr 30, 2026

View reviewed changes

Dewei-Wang-sh requested a review from zhanglx13 April 30, 2026 01:38

valarLip approved these changes Apr 30, 2026

View reviewed changes

Dewei-Wang-sh merged commit 77bda8d into main Apr 30, 2026
123 of 144 checks passed

Dewei-Wang-sh deleted the use_triton_wheel_no_fork branch April 30, 2026 03:01

sunway513 mentioned this pull request Apr 30, 2026

ci: make Standard Tests resilient to PRs missing install_triton.sh #2985

Merged

rocm-repo-management-api-7 Bot mentioned this pull request May 12, 2026

[Infrastructure] CI failure on PR #3015 #3138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use triton wheel no fork#2959

Use triton wheel no fork#2959
Dewei-Wang-sh merged 19 commits into
mainfrom
use_triton_wheel_no_fork

mengfei-jiang commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

brunomazzottiamd commented Apr 29, 2026

Uh oh!

brunomazzottiamd commented Apr 29, 2026

Uh oh!

brunomazzottiamd commented Apr 29, 2026

Uh oh!

Dewei-Wang-sh left a comment

Uh oh!

Dewei-Wang-sh commented Apr 30, 2026

Motivation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mengfei-jiang commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Apr 29, 2026

🏷️ CI Guide

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

brunomazzottiamd commented Apr 29, 2026

Uh oh!

brunomazzottiamd commented Apr 29, 2026

Uh oh!

brunomazzottiamd commented Apr 29, 2026

Uh oh!

Dewei-Wang-sh left a comment

Choose a reason for hiding this comment

Uh oh!

Dewei-Wang-sh commented Apr 30, 2026

Motivation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mengfei-jiang commented Apr 29, 2026 •

edited

Loading