fix: piecewise_cuda_graph get correct qo_indptr by yyihuang · Pull Request #21452 · sgl-project/sglang

yyihuang · 2026-03-26T04:39:10Z

Motivation

Modifications

for padding tokens, append a fake bs+1-th request with pad_tokens extend tokens whose KV indices all point to scratch slot 0. This makes qo_indptr[-1] = static_num_tokens, without affecting causal masks for real requests.

Accuracy Tests

python -m sglang.launch_server --model-path Qwen/Qwen3-14B --attention-backend flashinfer --disable-cuda-graph

python3 benchmark/gsm8k/bench_sglang.py --num-questions 100
100%|███████████████████████████████| 100/100 [00:05<00:00, 18.24it/s]
Accuracy: 0.950
Invalid: 0.000
Latency: 5.524 s
Output throughput: 2255.160 token/s

enable cuda graph:
python3 benchmark/gsm8k/bench_sglang.py --num-questions 100
100%|█████████████████████████████| 100/100 [00:04<00:00, 24.05it/s]
Accuracy: 0.940
Invalid: 0.000
Latency: 4.198 s
Output throughput: 2968.234 token/s

update after review comment:
(flashinfer_bench) averyh@umb-b200-238:~/flashinfer-bench/tmp/sglang$ python3 benchmark/gsm8k/bench_sglang.py --num-questions 100
100%|███████████████████████████████████████████████████████| 100/100 [00:04<00:00, 24.82it/s]
Accuracy: 0.930
Invalid: 0.000
Latency: 4.035 s
Output throughput: 3180.599 token/s

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-26T04:39:14Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-03-26T05:16:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ispobock · 2026-03-26T12:18:38Z

/tag-and-rerun-ci

Fridge003 · 2026-03-26T22:31:37Z

There is performance regression with this PR
https://github.com/sgl-project/sglang/actions/runs/23578699380/job/68792410119?pr=21452

Oasis-Git · 2026-03-27T00:26:33Z

Run the test locally with h100 on tp=1 and tp=8 and gsm test passes

Oasis-Git

In general the change is reasonable. Here is some suggestions for revision.

Oasis-Git · 2026-03-27T00:57:28Z

python/sglang/srt/model_executor/piecewise_cuda_graph_runner.py

+        num_tokens = len(forward_batch.input_ids)
+        index = bisect.bisect_left(self.capture_num_tokens, num_tokens)
+        static_num_tokens = self.capture_num_tokens[index]
+        with enable_piecewise_cuda_graph(num_tokens=static_num_tokens):


I think we can move num_tokens into the ForwardContext. Also to skip the computation and sync with item(), it is suggested that the var such as num_dummy_pages should be pre-calculated

Hi I take your suggestions to update the code:

Added self.num_tokens: Optional[int] = None field to ForwardContext

Eliminated both .item() GPU-CPU syncs in the dummy-request block

Oasis-Git · 2026-03-28T04:59:09Z

with --max-running-requests 1 disable the pcg to pass the ci

init

c849d77

yyihuang requested review from Fridge003, HaiShaw, Qiaolin-Yu, hebiao064, ispobock and merrymercy as code owners March 26, 2026 04:39

github-actions bot added the piecewise-cuda-graph label Mar 26, 2026

Merge branch 'main' into fix_21218

19e2e63

yyihuang marked this pull request as draft March 26, 2026 04:55

averyhNV added 2 commits March 26, 2026 05:04

upd

c729bcd

add dummy

f2e93f6

yyihuang marked this pull request as ready for review March 26, 2026 05:16

Fridge003 mentioned this pull request Mar 26, 2026

chore: bump flashinfer version to 0.6.7 #21422

Merged

ispobock mentioned this pull request Mar 26, 2026

[Bug] Piecewise CUDA graph replay crashes with FlashInfer ≥0.6.6: q.shape[0] does not match qo_indptr[-1] in paged prefill #21218

Open

5 tasks

github-actions bot added the run-ci label Mar 26, 2026

ispobock assigned Oasis-Git Mar 26, 2026

Oasis-Git reviewed Mar 27, 2026

View reviewed changes

upd with review

f68e911

disable pcg in some tests

5858388

Fridge003 approved these changes Mar 28, 2026

View reviewed changes

Fridge003 added 2 commits March 28, 2026 00:17

Merge branch 'main' into fix_21218

0f1a5a7

upd

3d7390d

Fridge003 merged commit 3ab9afd into sgl-project:main Mar 28, 2026
237 of 270 checks passed

Fridge003 mentioned this pull request Mar 30, 2026

fix nemotron capture for non attention layers #21436

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: piecewise_cuda_graph get correct qo_indptr#21452

fix: piecewise_cuda_graph get correct qo_indptr#21452
Fridge003 merged 8 commits intosgl-project:mainfrom
yyihuang:fix_21218

yyihuang commented Mar 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

ispobock commented Mar 26, 2026

Uh oh!

Fridge003 commented Mar 26, 2026

Uh oh!

Oasis-Git commented Mar 27, 2026 •

edited

Loading

Uh oh!

Oasis-Git left a comment

Uh oh!

Oasis-Git Mar 27, 2026

Uh oh!

yyihuang Mar 27, 2026

Uh oh!

Oasis-Git commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yyihuang commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

ispobock commented Mar 26, 2026

Uh oh!

Fridge003 commented Mar 26, 2026

Uh oh!

Oasis-Git commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oasis-Git left a comment

Choose a reason for hiding this comment

Uh oh!

Oasis-Git Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

yyihuang Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Oasis-Git commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yyihuang commented Mar 26, 2026 •

edited

Loading

Oasis-Git commented Mar 27, 2026 •

edited

Loading

Oasis-Git commented Mar 28, 2026 •

edited

Loading