[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph #12518

BBuf · 2025-11-02T15:26:53Z

python3 -m sglang.launch_server --model-path Qwen/QwQ-32B-AWQ --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|███████████████████████████████████████████████████████████████████████| 1319/1319 [00:45<00:00, 28.97it/s]
Accuracy: 0.680
Invalid: 0.000
Latency: 45.628 s
Output throughput: 9831.561 token/s

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4 --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|██████████████████████████████████████████████████████████████████████| 1319/1319 [00:09<00:00, 135.12it/s]
Accuracy: 0.733
Invalid: 0.005
Latency: 9.817 s
Output throughput: 17887.372 token/s

Oasis-Git

LGTM except some minor modification

python/sglang/srt/layers/quantization/marlin_utils.py

FlamingoPg · 2025-11-06T04:16:23Z

@BBuf I see there are some fake registers in the code. Is this for torch compile?

BBuf · 2025-11-06T12:56:12Z

@BBuf I see there are some fake registers in the code. Is this for torch compile?

Right: register the quantized kernels for piecewise CUDA Graph compilation.

ispobock · 2025-11-09T15:29:11Z

test/srt/test_piecewise_cuda_graph.py

        self.assertGreaterEqual(metrics["score"], 0.90)


+class TestPiecewiseCudaGraphAWQ(CustomTestCase):


Could you update the estimate time in the run_suite.py?

try to support awq/gptq model in piecewise cudagraph

5d92ce6

sglang-bot added the run-ci label Nov 2, 2025

support awq/gptq model

aa690a3

BBuf changed the title ~~try to support awq/gptq model in piecewise cudagraph~~ [PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph Nov 3, 2025

support awq/gptq model

31865b9

BBuf marked this pull request as ready for review November 3, 2025 03:31

BBuf requested review from Edwardf0t1, FlamingoPg, Ying1123, ch-wan, hnyls2002, ispobock, merrymercy and zhyncs as code owners November 3, 2025 03:31

upd

ba2584b

Oasis-Git reviewed Nov 3, 2025

View reviewed changes

python/sglang/srt/layers/quantization/marlin_utils.py Outdated Show resolved Hide resolved

Oasis-Git mentioned this pull request Nov 3, 2025

Support piecewise cuda graph for MLA #11812

Open

lint

9079863

FlamingoPg self-assigned this Nov 6, 2025

ispobock mentioned this pull request Nov 7, 2025

[Feature] Roadmap for Prefill (piecewise) Cuda Graph #11490

Open

26 tasks

Merge branch 'main' into try_to_fix_piecewise_cuda_graph_awq_model

0d7cc12

ispobock requested a review from Fridge003 as a code owner November 9, 2025 12:26

Merge branch 'main' into try_to_fix_piecewise_cuda_graph_awq_model

8eff634

ispobock reviewed Nov 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph #12518

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph #12518

BBuf commented Nov 2, 2025 •

edited

Loading

Uh oh!

Oasis-Git left a comment •

edited

Loading

Uh oh!

Uh oh!

FlamingoPg commented Nov 6, 2025

Uh oh!

BBuf commented Nov 6, 2025 •

edited

Loading

Uh oh!

ispobock Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		self.assertGreaterEqual(metrics["score"], 0.90)


		class TestPiecewiseCudaGraphAWQ(CustomTestCase):

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph #12518

Are you sure you want to change the base?

[PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph #12518

Conversation

BBuf commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oasis-Git left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FlamingoPg commented Nov 6, 2025

Uh oh!

BBuf commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ispobock Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

BBuf commented Nov 2, 2025 •

edited

Loading

Oasis-Git left a comment •

edited

Loading

BBuf commented Nov 6, 2025 •

edited

Loading