[Model] Introduce CUDA Graph support for DeepSeek v3 by houseroad · Pull Request #12204 · vllm-project/vllm

houseroad · 2025-01-20T06:43:28Z

Kudos to @jianyuh, who introduce the CUDA graph to DeepSeek v3. The overall throughput almost doubled based on the testing.

Signed-off-by: Lu Fang <lufang@fb.com>

github-actions · 2025-01-20T06:43:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2025-01-20T06:54:57Z

wow that's amazing!
cc @WoosukKwon

ywang96 · 2025-01-20T07:21:34Z

@houseroad Before Woosuk takes a look at this PR, ~~can you run format.sh so that we can get the formatting issue out of the way? Thanks!~~ nvm - the pre-commit PR just got merged #11975, feel free to check the documentation here

tlrmchlsmth

LGTM, and thanks for the fix! (I'm accepting but don't have a system to run DeepSeek v3 so can't verify the fix -- changes look good anyway)

tlrmchlsmth · 2025-01-20T14:36:09Z

csrc/moe/moe_align_sum_kernels.cu

-  if (num_experts >= 256) {
+  if (!use_shared_memory) {


Why was this change needed for the fix BTW?

tlrmchlsmth · 2025-01-20T14:38:15Z

csrc/moe/moe_align_sum_kernels.cu

-          const int32_t mem_tokens_cnts =
-              ((num_experts + 1) * num_experts) * sizeof(int32_t);
-          const int32_t mem_cumsum = (num_experts + 1) * sizeof(int32_t);
-          // allocate global memory
-          int32_t* tokens_cnts;
-          int32_t* cumsum;
-          cudaMalloc(&tokens_cnts, mem_tokens_cnts);
-          cudaMalloc(&cumsum, mem_cumsum);
+          torch::Tensor token_cnts =
+              torch::empty({(num_experts + 1) * num_experts},
+                           torch::TensorOptions()
+                               .dtype(torch::kInt)
+                               .device(topk_ids.device()));
+          torch::Tensor cumsum =
+              torch::empty({num_experts + 1}, torch::TensorOptions()
+                                                  .dtype(torch::kInt)
+                                                  .device(topk_ids.device()));


Makes sense to me, since during cuda graph capture, some actions, such as cudaMalloc, may be unsafe

Nice doc pointer, thanks

mgoin

LGTM thank you

youkaichao · 2025-01-20T15:55:34Z

it is recommended to merge main, and use pre-commit to run the linter now.

mgoin · 2025-01-20T16:12:02Z

I actually think this PR #12222 has a better implementation of this optimization, could you please help review @houseroad ?

mgoin

Considering this PR instead #12222

houseroad · 2025-01-20T22:20:17Z

Yeah, agree #12222 is a better approach, left some comment, but overall looks good to me. Let's go with #12222. So close this one.

Introduce CUDA Graph support for DeepSeek v3

68d8ad7

Signed-off-by: Lu Fang <lufang@fb.com>

casper-hansen mentioned this pull request Jan 20, 2025

[Kernel] add triton fused moe kernel for gptq/awq #12185

Merged

tlrmchlsmth approved these changes Jan 20, 2025

View reviewed changes

mgoin approved these changes Jan 20, 2025

View reviewed changes

mgoin requested changes Jan 20, 2025

View reviewed changes

houseroad closed this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Introduce CUDA Graph support for DeepSeek v3#12204

[Model] Introduce CUDA Graph support for DeepSeek v3#12204
houseroad wants to merge 1 commit intovllm-project:mainfrom
houseroad:cuda_graph_deepseekv3

houseroad commented Jan 20, 2025

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

youkaichao commented Jan 20, 2025

Uh oh!

ywang96 commented Jan 20, 2025 •

edited

Loading

Uh oh!

tlrmchlsmth left a comment

Uh oh!

tlrmchlsmth Jan 20, 2025

Uh oh!

tlrmchlsmth Jan 20, 2025

Uh oh!

mgoin Jan 20, 2025

Uh oh!

mgoin left a comment

Uh oh!

youkaichao commented Jan 20, 2025

Uh oh!

mgoin commented Jan 20, 2025 •

edited

Loading

Uh oh!

mgoin left a comment •

edited

Loading

Uh oh!

houseroad commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

houseroad commented Jan 20, 2025

Uh oh!

github-actions bot commented Jan 20, 2025

Uh oh!

youkaichao commented Jan 20, 2025

Uh oh!

ywang96 commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Jan 20, 2025

Uh oh!

mgoin commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

houseroad commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ywang96 commented Jan 20, 2025 •

edited

Loading

mgoin commented Jan 20, 2025 •

edited

Loading

mgoin left a comment •

edited

Loading