Skip to content

Conversation

@BBuf
Copy link
Collaborator

@BBuf BBuf commented Nov 2, 2025

python3 -m sglang.launch_server --model-path Qwen/QwQ-32B-AWQ --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|███████████████████████████████████████████████████████████████████████| 1319/1319 [00:45<00:00, 28.97it/s]
Accuracy: 0.680
Invalid: 0.000
Latency: 45.628 s
Output throughput: 9831.561 token/s

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4 --tp 4 --host 0.0.0.0 --enable-piecewise-cuda-graph

100%|██████████████████████████████████████████████████████████████████████| 1319/1319 [00:09<00:00, 135.12it/s]
Accuracy: 0.733
Invalid: 0.005
Latency: 9.817 s
Output throughput: 17887.372 token/s

@BBuf BBuf changed the title try to support awq/gptq model in piecewise cudagraph [PieceWise CUDA Graph] Support awq/gptq model in piecewise cudagraph Nov 3, 2025
@BBuf BBuf marked this pull request as ready for review November 3, 2025 03:31
Copy link
Collaborator

@Oasis-Git Oasis-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor modification

@FlamingoPg FlamingoPg self-assigned this Nov 6, 2025
@FlamingoPg
Copy link
Collaborator

@BBuf I see there are some fake registers in the code. Is this for torch compile?

@BBuf
Copy link
Collaborator Author

BBuf commented Nov 6, 2025

@BBuf I see there are some fake registers in the code. Is this for torch compile?

Right: register the quantized kernels for piecewise CUDA Graph compilation.

@ispobock ispobock requested a review from Fridge003 as a code owner November 9, 2025 12:26
self.assertGreaterEqual(metrics["score"], 0.90)


class TestPiecewiseCudaGraphAWQ(CustomTestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the estimate time in the run_suite.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants