[WIP] Fix piecewise cuda graph recompile by zminglei · Pull Request #13622 · sgl-project/sglang

zminglei · 2025-11-20T05:29:24Z

Motivation

Modifications

Without piecewise cuda graph:

python3 -m sglang.launch_server --model /shared/public/elr-models/xai-org/grok-2/ --tokenizer-path /shared/public/elr-models/xai-org/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton --disable-radix-cache

python benchmark/gsm8k/bench_sglang.py --data-path /shared/public/data/gsm8k/test.jsonl --num-questions 1319
Accuracy: 0.929
Invalid: 0.000
Latency: 133.680 s
Output throughput: 1111.674 token/s

python3 -m sglang.bench_serving --backend sglang --dataset-name random-ids --num-prompts 1 --random-input-len 10
24 --random-output-len 1 --random-range-ratio 1 --tokenizer /shared/public/elr-models/xai-org/grok-2/tokenizer.tok.json
---------------Time to First Token----------------
Mean TTFT (ms):                          104.61    
Median TTFT (ms):                        104.61    
P99 TTFT (ms):                           104.61

With piecewise cuda graph:

python3 -m sglang.launch_server --model /shared/public/elr-models/xai-org/grok-2/ --tokenizer-path /shared/public/elr-models/xai-org/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton --enable-piecewise-cuda-graph --disable-radix-cache

python3 -m sglang.bench_serving --backend sglang --dataset-name random-ids --num-prompts 1 --random-input-len 1024 --random-output-len 1 --random-range-ratio 1 --tokenizer /shared/public/elr-models/xai-org/grok-2/tokenizer.tok.json

python benchmark/gsm8k/bench_sglang.py --data-path /shared/public/data/gsm8k/test.jsonl --num-questions 1319
Accuracy: 0.934
Invalid: 0.001
Latency: 121.737 s
Output throughput: 1219.835 token/s

python3 -m sglang.bench_serving --backend sglang --dataset-name random-ids --num-prompts 1 --random-input-len 10
24 --random-output-len 1 --random-range-ratio 1 --tokenizer /shared/public/elr-models/xai-org/grok-2/tokenizer.tok.json
---------------Time to First Token----------------
Mean TTFT (ms):                          79.50     
Median TTFT (ms):                        79.50     
P99 TTFT (ms):                           79.50

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

hebiao064 · 2025-11-20T06:09:47Z

python/sglang/srt/compilation/backend.py


-        assert not self._called, "SGLangBackend can only be called once"
+        # assert not self._called, "SGLangBackend can only be called once"
+        if (self._called):


zminglei · 2025-12-12T21:12:54Z

Moved to this PR #13667

zminglei added 2 commits November 20, 2025 05:21

[WIP] Fix grok2 piecewise cuda graph

07e7374

remove

66a20b1

hebiao064 reviewed Nov 20, 2025

View reviewed changes

hebiao064 changed the title ~~[WIP] Fix grok2 piecewise cuda graph~~ [WIP] Fix piecewise cuda graph recompile Nov 20, 2025

remove

c887324

zminglei closed this Dec 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix piecewise cuda graph recompile#13622

[WIP] Fix piecewise cuda graph recompile#13622
zminglei wants to merge 3 commits intosgl-project:mainfrom
zminglei:grok-piecewise

zminglei commented Nov 20, 2025 •

edited by hebiao064

Loading

Uh oh!

hebiao064 Nov 20, 2025

Uh oh!

zminglei commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zminglei commented Nov 20, 2025 • edited by hebiao064 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

hebiao064 Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

zminglei commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zminglei commented Nov 20, 2025 •

edited by hebiao064

Loading