[PCG] fix piecewise cuda graph for Qwen3.5 by zminglei · Pull Request #19220 · sgl-project/sglang

zminglei · 2026-02-24T05:41:45Z

Motivation

fix piecewise cuda graph for Qwen3.5

Modifications

fix piecewise cuda graph for Qwen3.5
clean up legacy code gdn_with_output as it's not used anymore.

Accuracy Tests

main:

SGLANG_ENABLE_JIT_DEEPGEMM=0 python -m sglang.launch_server --model-path /shared/public/elr-models/Qwen/Qwen3.5-397B-A17B-FP8 --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --enable-piecewise-cuda-graph

[2026-02-24 04:43:15 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/jobuser/zminglei/sglang/venv/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2755, in _dispatch_impl
    r = func(*args, **kwargs)
  File "/home/jobuser/zminglei/sglang/venv/lib/python3.10/site-packages/torch/_ops.py", line 841, in __call__
    return self._op(*args, **kwargs)
NotImplementedError: sgl_kernel::fp8_blockwise_scaled_mm: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl. Please see the following for next steps:  https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html

This PR:

SGLANG_ENABLE_JIT_DEEPGEMM=0 python -m sglang.launch_server --model-path /shared/public/elr-models/Qwen/Qwen3.5-397B-A17B-FP8 --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser qwen3 --enable-piecewise-cuda-graph

python benchmark/gsm8k/bench_sglang.py --data-path /shared/public/data/gsm8k/test.jsonl --port 8000 --num-questions 1319 --parallel 1319
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [01:56<00:00, 11.35it/s]
Accuracy: 0.948
Invalid: 0.008
Latency: 116.176 s
Output throughput: 1831.956 token/s

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-24T05:41:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-02-24T05:51:13Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zminglei · 2026-02-24T05:51:23Z

/tag-and-rerun-ci again

yuan-luo · 2026-02-24T11:30:14Z

python/sglang/srt/models/qwen3_5.py

-    ):
-        output = torch.empty_like(hidden_states)
-        if forward_batch.forward_mode.is_extend() and get_forward_context() is not None:
-            gdn_with_output(


why remove this branch?

This branch was for PCG purpose (most likely copied from previous qwen3_next.py). But now it's not needed anymore since we added split_op inside RadixLinearAttention. It's for the same purpose of this PR #17613

Oasis-Git

Maybe split the change for vl model to another pr if possible

zminglei added 2 commits February 24, 2026 05:30

[PCG] fix piecewise cuda graph for Qwen3.5

169baf1

fix

af09b44

lint

d549a98

zminglei marked this pull request as ready for review February 24, 2026 05:51

zminglei requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners February 24, 2026 05:51

github-actions bot added the run-ci label Feb 24, 2026

yuan-luo reviewed Feb 24, 2026

View reviewed changes

remove comment

9f4d42f

hebiao064 approved these changes Feb 25, 2026

View reviewed changes

Oasis-Git approved these changes Feb 25, 2026

View reviewed changes

ispobock approved these changes Feb 25, 2026

View reviewed changes

ispobock merged commit b3202fe into sgl-project:main Feb 26, 2026
631 of 679 checks passed

klhhhhh pushed a commit to klhhhhh/sglang that referenced this pull request Feb 26, 2026

[PCG] fix piecewise cuda graph for Qwen3.5 (sgl-project#19220)

3a7d249

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

[PCG] fix piecewise cuda graph for Qwen3.5 (sgl-project#19220)

425c926

hksdpc255 mentioned this pull request Mar 10, 2026

Full graph parallel for Qwen3.5 (dense and MoE) ikawrakow/ik_llama.cpp#1388

Merged

lawrence-harmonic added a commit to lawrence-harmonic/sglang that referenced this pull request Mar 10, 2026

backport sgl-project#19220

e95e058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PCG] fix piecewise cuda graph for Qwen3.5#19220

[PCG] fix piecewise cuda graph for Qwen3.5#19220
ispobock merged 4 commits intosgl-project:mainfrom
zminglei:qwen3.5-pcg

zminglei commented Feb 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Uh oh!

zminglei commented Feb 24, 2026 •

edited

Loading

Uh oh!

yuan-luo Feb 24, 2026

Uh oh!

zminglei Feb 24, 2026 •

edited

Loading

Uh oh!

Oasis-Git left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zminglei commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Uh oh!

zminglei commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuan-luo Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

zminglei Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Oasis-Git left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zminglei commented Feb 24, 2026 •

edited

Loading

zminglei commented Feb 24, 2026 •

edited

Loading

zminglei Feb 24, 2026 •

edited

Loading