Skip to content

[AMD] CI - Detect the aiter version and rebuild if needed#15460

Merged
HaiShaw merged 9 commits intosgl-project:mainfrom
yctseng0211:detect_aiter
Dec 23, 2025
Merged

[AMD] CI - Detect the aiter version and rebuild if needed#15460
HaiShaw merged 9 commits intosgl-project:mainfrom
yctseng0211:detect_aiter

Conversation

@yctseng0211
Copy link
Collaborator

@yctseng0211 yctseng0211 commented Dec 19, 2025

Motivation

https://github.com/sgl-project/sglang/actions/runs/20021784695/job/57446087825
Our AMD CI workflow always pulls the latest pre-built daily Docker image and then installs dependencies based on the PR code. However, this installation step does not rebuild or update AITER, which is already baked into the pre-built image.

As a result, when a PR upgrades the AITER version, the CI does not actually test against the new AITER, but instead tests against whatever version was pre-built into the daily image. This leads to a blind spot where AITER-related regressions cannot be caught at PR time.

Example:
In the PR #14497 , we upgraded AITER to v0.1.7.post5.
In our later manual tests we found that this version introduces an accuracy regression in:
accuracy-test-2-gpu-amd (linux-mi325-gpu-2)

However, the CI for this PR incorrectly passed(see in https://github.com/sgl-project/sglang/actions/runs/20021784695/job/57446087825), because it was still using the older AITER version bundled in the daily image, not the upgraded version specified in the PR.

Conclusion:
We need CI to verify whether the AITER version in the PR matches the one inside the pulled image. If they differ, CI must rebuild/install the correct AITER version inside the container before running tests.

cc: @bingxche @michael-amd @saienduri

Modifications

Accuracy Tests

Simulation - aiter version upgraded to v0.1.8

https://github.com/sgl-project/sglang/actions/runs/20368597241/job/58529505129
image
image

Simulation - If aiter version unchanged

https://github.com/sgl-project/sglang/actions/runs/20388611486/job/58594282444?pr=15460
image

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the amd label Dec 19, 2025
@yctseng0211 yctseng0211 marked this pull request as ready for review December 19, 2025 10:25
@yctseng0211 yctseng0211 changed the title [AMD] CI detect aiter version and rebuild [AMD] CI - Detect the aiter version and rebuild if needed Dec 19, 2025
Copy link

@michael-amd michael-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HaiShaw HaiShaw merged commit e50f356 into sgl-project:main Dec 23, 2025
122 of 127 checks passed
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants