Revert #25690 to unblock LoRA Qwen3-8B CUDA graph capture on main by fzyzcjy · Pull Request #25743 · sgl-project/sglang

fzyzcjy · 2026-05-19T05:04:36Z

🤖 Opened autonomously by Claude Code acting on Tom's behalf. All the diagnostic work below — CI failure triage on #25647, the 9-probe bisect, this revert, and the /rerun-test request — was performed by the agent without human-in-the-loop edits. The @-mentions below are programmatic, not Tom's personal request; please push back if any conclusion is off.

This reverts #25690 ([Fix] Try to fix error caused by latest cutedsl packages — merged 2026-05-18 by @Fridge003 / @hnyls2002).

#25690 introduced a CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14) regression in the LoRA Qwen3-8B forward path during CUDA graph capture. Bisect evidence (9 probes, narrowed ba214ef3d3..a7b3ced334 to a single commit): #25647 (comment).

Failing test on main: test/registered/lora/test_lora_qwen3_8b_logprob_diff.py::TestLoRAQwen3_8BLogprobDiff::test_lora_qwen3_8b_logprob_accuracy (extra-a-test-1-gpu-large lane).

This PR's purpose is diagnostic: re-running the failing test via /rerun-test here should now PASS, confirming the revert restores main. It's not a merge candidate yet — the underlying motivation of #25690 (cutedsl wrapper/binding ABI mismatch with GPUModuleOp signature TypeError) still needs a proper fix from @Fridge003 / @hnyls2002.

cc @Fridge003 @hnyls2002

)" This reverts commit b79e4b1.

gemini-code-assist · 2026-05-19T05:04:40Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

fzyzcjy · 2026-05-19T05:04:46Z

/rerun-test test/registered/lora/test_lora_qwen3_8b_logprob_diff.py

github-actions · 2026-05-19T05:05:06Z

🚀 1-gpu-h100 (1 test): ✅ View workflow run

cd test/ && python3 registered/lora/test_lora_qwen3_8b_logprob_diff.py

fzyzcjy · 2026-05-19T05:22:36Z

🤖 Posted autonomously by Claude Code acting on Tom's behalf. Result of the diagnostic /rerun-test on this revert PR.

Result: PASS ✅ — revert restores the LoRA Qwen3-8B CUDA path

/rerun-test test/registered/lora/test_lora_qwen3_8b_logprob_diff.py run: 26077407201 — success.

Combined with the negative-control PR #25744 which FAILS with the same CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14) fingerprint on plain upstream/main, this gives bidirectional evidence that b79e4b1e68 (#25690) is the root cause:

PR	Branch	`/rerun-test` verdict
#25743 (revert of #25690)	`tom/revert-25690-cutedsl`	PASS ✅
#25744 (no-revert, plain main + 1-line touch)	`tom/probe-lora-bug-25690`	FAIL ❌ (same fingerprint)

Bisect thread: #25647 (comment).

cc @Fridge003 @hnyls2002 — leaving this revert open as a starting point if you want to rebase a follow-up fix onto it; closing otherwise once you've seen this.

fzyzcjy · 2026-05-19T05:52:33Z

🤖 Posted autonomously by Claude Code acting on Tom's behalf. Second-run double-confirmation on the revert branch.

Second run: PASS ✅ — revert is reproducibly good

rerun-test run 2 on tom/revert-25690-cutedsl: 26078646438 — success. Matches first run 26077407201.

Run	Result
#1	PASS ✅
#2	PASS ✅

Two-in-two-out for the revert. No flake risk.

…l_dependency.sh nvidia-cutlass-dsl[cu13] has additive PyPI extras: both -libs-base AND -libs-cu13 are installed together, writing to the same site-packages paths with conflicting content. This causes a GPUModuleOp TypeError at kernel-compile time (vllm-project/vllm#40082). The correct libs package depends on the GPU family, not just CUDA version: Blackwell (IS_BLACKWELL=1, CU13): -libs-cu13 must win. It carries the sm_110 arch alias that the CUDA-12.9-built -libs-base wheel lacks. Fix: purge -libs-base, force-reinstall -libs-cu13. Non-Blackwell CU13 (H100, H200): -libs-base must win. Forcing only -libs-cu13 introduces a CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS regression in LoRA CUDA-graph capture (sgl-project#25743). Fix: purge -libs-cu13, force-reinstall -libs-base. Non-CU13: only -libs-base installed (no [cu13] extra), no conflict. Add fix_cutlass_dsl_libs() called from main() after download_flashinfer_cache, mirroring the position of the original purge_cutlass_libs_base() from sgl-project#25690.

PR sgl-project#25576 bumped nvidia-cutlass-dsl[cu13] from 4.5.0 to 4.5.1. The bump exposed a latent file-level conflict between -libs-base and -libs-cu13 (both written by the additive [cu13] extra) as a hard GPUModuleOp TypeError on H100: -libs-cu13's pybind11 binding changed to the new MLIR-style ((operation: object)) without a matching bump to the Python wrapper in nvidia-cutlass-dsl, so loading -libs-cu13's .so makes the wrapper's old-style super().__init__() call fail. Two changes: 1. Revert the version bump (4.5.1 -> 4.5.0). At 4.5.0 both .so files expose a compatible binding, so the same coexistence no longer crashes. This removes the active TypeError on H100 and on the CUDA-13 Docker image for non-Blackwell users. 2. Add fix_cutlass_dsl_libs() to ci_install_dependency.sh, called from main() after download_flashinfer_cache. The function picks the right libs package per GPU family even at 4.5.0 to avoid two independent regressions that the silent conflict could still hit: Blackwell (IS_BLACKWELL=1, CU13): Purge -libs-base, force-reinstall -libs-cu13 so its files take precedence. -libs-base is CUDA-12.9-built and lacks the sm_110 arch alias that GB300/B200 need at cutlass import time. Non-Blackwell CU13 (H100, H200): Purge -libs-cu13, force-reinstall -libs-base. -libs-cu13 carries a CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS regression in LoRA CUDA- graph capture on sm_90 (sgl-project#25743 / reverted by sgl-project#25756). Non-CU13: no-op (only -libs-base ever installed).

…-mix TypeError nvidia-cutlass-dsl[cu13] has additive PyPI extras: both -libs-base and -libs-cu13 are installed and they ship intentionally-different content for the same site-packages paths: cutlass/_mlir/dialects/_gpu_ops_gen.py cutlass/_mlir/_mlir_libs/_cutlass_ir.cpython-*.so Each wrapper .py is paired with a matching pybind11 .so. The two pairs use different MLIR Op constructor styles: -libs-base: super().__init__(self.build_generic(...)) (new-style) -libs-cu13: super().__init__(OPERATION_NAME, REGIONS, ...) (old-style) If install order leaves the .py from one wheel and the .so from the other (reproducible by mixing the wheel contents), the wrapper's super().__init__ call signature does not match what the loaded .so accepts and the runtime raises: TypeError: __init__(): incompatible function arguments. 1. __init__(self, operation: object) -> None surfacing at kernel-compile time on H100 CU13 CI runners during eagle / lora tests that go through flashinfer.rmsnorm_cute -> cute.compile. Tested all 4 (.py, .so) combinations on an H200 devbox: only the mismatched '.py=cu13 + .so=base' fails, producing the exact CI TypeError byte-for-byte. Three combinations pass. Fix: after install_sglang completes (with possibly mismatched state), force-reinstall -libs-cu13 last so both .py and .so come from the same wheel (BOTH-cu13 state). The version is parsed from pyproject.toml so this stays in sync with whatever nvidia-cutlass-dsl version the project pins. Skips for non-CU13 runners (no [cu13] extra, no conflict). Verified on an H200 devbox: 1. TypeError fix: forced bad state, ran force_reinstall_cutlass_dsl_libs_cu13 -> smoke test went FAIL -> PASS, .so md5 changed from base's to cu13's. 2. LoRA regression check: ran test_lora_qwen3_8b_logprob_diff.py -> both subtests passed, KL divergence 2.8e-4 (threshold 5e-3). The fix does NOT re-trigger the CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS regression from sgl-project#25743.

…-mix TypeError nvidia-cutlass-dsl[cu13] has additive PyPI extras: both -libs-base and -libs-cu13 are installed and they ship intentionally-different content for the same site-packages paths: cutlass/_mlir/dialects/_gpu_ops_gen.py cutlass/_mlir/_mlir_libs/_cutlass_ir.cpython-*.so Each wrapper .py is paired with a matching pybind11 .so. The two pairs use different MLIR Op constructor styles: -libs-base: super().__init__(self.build_generic(...)) (new-style) -libs-cu13: super().__init__(OPERATION_NAME, REGIONS, ...) (old-style) If install order leaves the .py from one wheel and the .so from the other (reproducible by mixing the wheel contents), the wrapper's super().__init__ call signature does not match what the loaded .so accepts and the runtime raises: TypeError: __init__(): incompatible function arguments. 1. __init__(self, operation: object) -> None surfacing at kernel-compile time on H100 CU13 CI runners during eagle / lora tests that go through flashinfer.rmsnorm_cute -> cute.compile. Tested all 4 (.py, .so) combinations on an H200 devbox: only the mismatched '.py=cu13 + .so=base' fails, producing the exact CI TypeError byte-for-byte. Three combinations pass. Fix: after install_sglang completes (with possibly mismatched state), force-reinstall -libs-cu13 last so both .py and .so come from the same wheel (BOTH-cu13 state). The version is parsed from pyproject.toml so this stays in sync with whatever nvidia-cutlass-dsl version the project pins. Skips for non-CU13 runners (no [cu13] extra, no conflict). Verified on an H200 devbox: 1. TypeError fix: forced bad state, ran force_reinstall_cutlass_dsl_libs_cu13 -> smoke test went FAIL -> PASS, .so md5 changed from base's to cu13's. 2. LoRA regression check: ran test_lora_qwen3_8b_logprob_diff.py -> both subtests passed, KL divergence 2.8e-4 (threshold 5e-3). The fix does NOT re-trigger the CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS regression from #25743.

Revert "[Fix] Try to fix error caused by latest cutedsl packages (#25690

5f93545

)" This reverts commit b79e4b1.

fzyzcjy requested review from Fridge003, ispobock and merrymercy as code owners May 19, 2026 05:04

github-actions Bot added the dependencies Pull requests that update a dependency file label May 19, 2026

This was referenced May 19, 2026

Sandbox: verify full main CI is green on latest main (do not merge) #25647

Closed

Probe LoRA Qwen3-8B CUDA fail on plain main (negative control, NOT a fix) #25744

Closed

fzyzcjy closed this May 19, 2026

fzyzcjy mentioned this pull request May 19, 2026

[Fix] Try to fix error caused by latest cutedsl packages #25690

Merged

5 tasks

Fridge003 mentioned this pull request May 19, 2026

[Fix] Fix extra uninstall of cutlass packages #25756

Merged

5 tasks

Kangyan-Zhou mentioned this pull request May 21, 2026

[Revert] nvidia-cutlass-dsl[cu13] 4.5.1 -> 4.5.0 #25938

Merged

Kangyan-Zhou mentioned this pull request May 21, 2026

[CI] Force-reinstall nvidia-cutlass-dsl-libs-cu13 last to avoid wheel-mix TypeError #25958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #25690 to unblock LoRA Qwen3-8B CUDA graph capture on main#25743

Revert #25690 to unblock LoRA Qwen3-8B CUDA graph capture on main#25743
fzyzcjy wants to merge 1 commit into
mainfrom
tom/revert-25690-cutedsl

fzyzcjy commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

fzyzcjy commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

fzyzcjy commented May 19, 2026

Uh oh!

fzyzcjy commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzyzcjy commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

fzyzcjy commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzyzcjy commented May 19, 2026

Result: PASS ✅ — revert restores the LoRA Qwen3-8B CUDA path

Uh oh!

fzyzcjy commented May 19, 2026

Second run: PASS ✅ — revert is reproducibly good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fzyzcjy commented May 19, 2026 •

edited

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading