[Bugfix] make moe_align_block_size compliable with cuda graph by jinzhen-lin · Pull Request #12036 · vllm-project/vllm

jinzhen-lin · 2025-01-14T11:51:14Z

The moe_align_block_size is used by many moe models (e.g. DeepSeek-V3) but it is not compliable with cuda graph now. This PR fix it.

Reference: sgl-project/sglang@77d1210 sgl-project/sglang@6e53051

github-actions · 2025-01-14T11:51:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

jinzhen-lin · 2025-01-14T11:54:55Z

With cuda graph, the generation speed of DeepSeek-V3 (W4A16, 8*A100-80G, bs=1) increase from 5 tokens/s to 10 tokens/s.

mergify · 2025-01-19T09:30:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jinzhen-lin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jinzhen-lin · 2025-01-20T12:37:14Z

close this PR as I create a new PR with better moe_align_block_size, see #12222

fix moe_align_block_size

242dedf

jinzhen-lin added 5 commits January 14, 2025 20:01

fix format error

ac08db7

fix format error

314c1ae

fix format error

25db7c9

make buffer optional

f4cac8d

init buffer in kernel

83f0926

jinzhen-lin force-pushed the cuda_graph_moe_align_block_size branch from b3cba63 to 83f0926 Compare January 15, 2025 14:17

jinzhen-lin mentioned this pull request Jan 18, 2025

[Kernel] add triton fused moe kernel for gptq/awq #12185

Merged

fix format error

3b02123

mergify bot added the needs-rebase label Jan 19, 2025

jinzhen-lin closed this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] make moe_align_block_size compliable with cuda graph#12036

[Bugfix] make moe_align_block_size compliable with cuda graph#12036
jinzhen-lin wants to merge 7 commits intovllm-project:mainfrom
jinzhen-lin:cuda_graph_moe_align_block_size

jinzhen-lin commented Jan 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 14, 2025

Uh oh!

jinzhen-lin commented Jan 14, 2025

Uh oh!

mergify bot commented Jan 19, 2025

Uh oh!

jinzhen-lin commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jinzhen-lin commented Jan 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 14, 2025

Uh oh!

jinzhen-lin commented Jan 14, 2025

Uh oh!

mergify bot commented Jan 19, 2025

Uh oh!

jinzhen-lin commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinzhen-lin commented Jan 14, 2025 •

edited by github-actions bot

Loading