[AMD] Add rocm7.2.3 support by sogalin · Pull Request #26010 · sgl-project/sglang

sogalin · 2026-05-21T18:04:48Z

Motivation

Extend the AMD ROCm docker matrix to include the rocm/pytorch:rocm7.2.3_ubuntu22.04_py3.10_pytorch_release_2.9.1 base image so MI300X / MI355X users can build sglang against the newer ROCm 7.2.3 toolchain. ROCm 7.2.3 ships a different bundled torch + triton (triton==3.5.1+rocm7.2.3.gita272dfa8) compared to ROCm 7.2.0, and AITER kernels need a compatible Triton with two upstream cherry-picks to run correctly on top of it. We also take the opportunity to clean up the existing rocm7.2.0 torch hot patch.

Modifications

1. New rocm7.2.3 stages

New ARGs: BASE_IMAGE_942_ROCM723, BASE_IMAGE_950_ROCM723
New FROM stages: gfx942-rocm723, gfx950-rocm723
- BUILD_TRITON=1
- AITER_COMMIT_DEFAULT synced with the upstream upgrade (32e1e6d76988... from [AMD] Upgrade AITER #25896)
Usage examples added to the header comment block

2. Custom Triton build for rocm723 (gated by `case "${GPU_ARCH}" in rocm723`)

Repo: https://github.com/ROCm/triton.git
Commit: ba5c1517
Cherry-picks (from triton-lang/triton, reachable in the ROCm fork):
- 555d04f → triton-lang/triton#8991
- dd998b6 → triton-lang/triton#9541
  rocm720 / rocm700 stages keep the unchanged legacy Triton build (triton-lang/triton @ 42270451…).

3. Unified torch METADATA hot patch (refactor)

Both rocm/pytorch:rocm7.2.0 and rocm/pytorch:rocm7.2.3 ship a pre-installed torch wheel whose METADATA hard-pins triton:
Requires-Dist: triton==3.5.1+rocm7.2.x.git...; platform_system == "Linux" ...

Since this Dockerfile replaces triton with a custom build (BUILD_TRITON=1), the pin causes pip check / future pip install to fail with a version conflict.
The previous solution (hack.py) read a .whl from /, extracted it, edited METADATA, re-zipped, and pip install --force --no-deps. That added ~3 minutes per build and kept the 1.6GB source wheel at /. It also doesn't work for rocm7.2.3, where the base image does not ship a wheel at / (only the installed torch).
This PR replaces the wheel-roundtrip with a small hack_inplace.py that edits the installed torch-*.dist-info/METADATA to relax the pin to triton>=3.5.1 and blanks the matching RECORD row. Used by both rocm720 and rocm723 via case "${GPU_ARCH}" in *rocm720*|*rocm723*).
Diff summary:

Removed: ARG TORCH_ROCM_FILE, the wheel-based hack.py heredoc, the wheel-based RUN flow
Added: a single hack_inplace.py heredoc + a unified case branch
rocm720 build now finishes the patch in <1s instead of ~3 min, and the source wheel is cleaned up (rm -f /torch-*.whl)

4. amd-smi case extended

rocm/pytorch:rocm7.2.3 does not pre-install amd-smi either (verified inside the base image). Case extended:

-      *rocm720*) \
+      *rocm720*|*rocm723*) \
         echo "ROCm (GPU_ARCH=${GPU_ARCH}): installing amd-smi"; \
         cd /opt/rocm/share/amd_smi && python3 -m pip install --no-cache-dir . ;;

5. libdrm-amdgpu case generalized

rocm720 → rocm72 so the entire ROCm 7.2.x family bypasses the libdrm-amdgpu install (all 7.2.x bases ship the packages already). Behavior unchanged for rocm720.

Accuracy Tests

SGLANG_AITER_MLA_PERSIST=1 AITER_MXFP4_MOE_SF=1 SGLANG_USE_AITER=1 SGLANG_INT4_WEIGHT=0 SGLANG_MOE_PADDING=1 SGLANG_SET_CPU_AFFINITY=1 SGLANG_ROCM_FUSED_DECODE_MLA=1 SGLANG_USE_ROCM700A=1 python3 -m sglang.launch_server --model-path /dockerx/data/DeepSeek-R1-MXFP4-Preview/ --tensor-parallel-size 8 --trust-remote-code --host 0.0.0.0 --port 8000 --log-requests --mem-fraction-static 0.95 --chunked-prefill-size 131072 --attention-backend aiter --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --speculative-draft-model-path /dockerx/data/DeepSeek-R1-NextN --max-running-requests 64 --disable-radix-cache --kv-cache-dtype fp8_e4m3
GSM8K: 0.942

python3 -m sglang.launch_server --model-path /data/amd/Kimi-K2.5-MXFP4/ --tensor-parallel-size 4 --trust-remote-code --mem-fraction-static 0.765 --disable-radix-cache --decode-attention-backend aiter --prefill-attention-backend aiter --kv-cache-dtype fp8_e4m3 --chunked-prefill-size 16384 --max-prefill-tokens 16384 --max-running-requests 1024 --cuda-graph-max-bs 1024 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-aiter-allreduce-fusion --host 127.0.0.1 --port 8888
GSM8K: 0.93

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ✅ Run #26244096516
Latest PR Test (Extra): ❌ Run #26244096405

gemini-code-assist · 2026-05-21T18:04:53Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

[AMD] Add rocm7.2.3 docker build

8119abb

sogalin requested review from Fridge003, HaiShaw, ishandhanani, ispobock and yctseng0211 as code owners May 21, 2026 18:04

github-actions Bot added the amd label May 21, 2026

amd-bot mentioned this pull request May 22, 2026

[CI Monitor] Daily Report - 2026-05-22 bingxche/sglang-ci-bot#80

Open

yctseng0211 mentioned this pull request May 22, 2026

[AMD] WIP - rocm723 #26051

Draft

5 tasks

amd-bot mentioned this pull request Jun 12, 2026

[CI Monitor] Daily Report - 2026-06-12 bingxche/sglang-ci-bot#102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add rocm7.2.3 support#26010

[AMD] Add rocm7.2.3 support#26010
sogalin wants to merge 1 commit into
sgl-project:mainfrom
sogalin:update-rocm723

sogalin commented May 21, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sogalin commented May 21, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

1. New rocm7.2.3 stages

2. Custom Triton build for rocm723 (gated by case "${GPU_ARCH}" in *rocm723*)

3. Unified torch METADATA hot patch (refactor)

4. amd-smi case extended

5. libdrm-amdgpu case generalized

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

CI States

Uh oh!

gemini-code-assist Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sogalin commented May 21, 2026 •

edited by github-actions Bot

Loading

2. Custom Triton build for rocm723 (gated by `case "${GPU_ARCH}" in rocm723`)