Add fused top-K softmax kernel for MoE by WoosukKwon · Pull Request #2769 · vllm-project/vllm

WoosukKwon · 2024-02-05T18:35:27Z

This PR ports a fused topk-softmax kernel from TensorRT-LLM v0.7.1.

TODO:

Port more MoE-related kernels
Use CUTLASS-based grouped GEMM kernels with appropriate tuning (if they perform better than the current Triton kernel).

WoosukKwon · 2024-02-05T19:30:12Z

@Yard1 @cadedaniel @pcmoritz Can any one of you review the PR?

pcmoritz · 2024-02-05T19:41:08Z

Yes, happy to review! Thanks a lot for writing this :)

WoosukKwon · 2024-02-05T19:59:28Z

@pcmoritz Thanks!

vllm/model_executor/layers/fused_moe.py

csrc/moe/topk_softmax_kernels.cu

pcmoritz · 2024-02-05T22:43:00Z

Btw, I did a little bit of benchmarking on this PR and without touching any of the system parameters in the PR I'm already seeing a 1.5% - 3.5% end-to-end latency improvement. It is higher in the low latency regime. Concretely I tested on TP2 on H100 with 1000 input and 50 output tokens on Mixtral. So it seems worth merging this even though the low-level kernel code is not easy to follow -- most people can probably just treat it as a black box so it shouldn't have a big impact on maintainability.

pcmoritz

I will spend some more time trying to understand the implementation in topk_softmax_kernels.cu but no need to block on that since that's mostly the upstream code from https://github.com/NVIDIA/TensorRT-LLM/blob/v0.7.1/cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.cu and in any case we should probably keep it close to that and not change it :)

tests/kernels/test_moe.py

WoosukKwon · 2024-02-06T00:09:09Z

@pcmoritz Thanks again for you review! Yes, I think we don't have to worry too much about the implementation details, at least at the moment, as I only made a minor change to the kernel.

pcmoritz · 2024-02-06T00:12:48Z

Sounds good, the PR looks great :)

WoosukKwon added 30 commits January 31, 2024 10:12

Add CUTLASS as a submodule

ad66935

Port CUTLASS extensions

396e537

Port MoE kernels

0cd9436

Move moe_kernels

cb4524c

Port MoE GEMM

c191207

Port CUTLASS kernels

cfa4554

Remove MoE gemm

90ccdfa

Merge branch 'main' into cutlass-moe

3e90c1a

Remove unused CUTLASS kernels

77a5c8d

Minor

f1583de

Add topk_softmax kernels

de7a749

Remove unnecessary headers

e5c62e8

Add MoE namespace

e127d9b

Minor

c3096a0

Add permute_kernels

9a561cc

Remove unused

ba07256

Move

def2ccd

Move

72256cc

Remove

e86fd06

Add MoE MLP

612f961

Add cudaUtils

0bf8fb9

Fix headers

c09179d

Enable BF16

2ab65df

Err msg

c74fc79

Add unpermute_and_reduce

6320de4

Add renormalize

9b57e39

Add FusedMoE

55fae45

Remove dependency on cutlass

5dcf104

Remove CUTLASS

92ac8dd

Fix Mixtral & DeepSeek

cf559dc

WoosukKwon added 3 commits February 5, 2024 18:40

yapf

186901b

Minor fix

26ef5a0

Add minor comment

6f33c73

WoosukKwon requested a review from pcmoritz February 5, 2024 19:59

pcmoritz reviewed Feb 5, 2024

View reviewed changes

vllm/model_executor/layers/fused_moe.py Show resolved Hide resolved

pcmoritz reviewed Feb 5, 2024

View reviewed changes

vllm/model_executor/layers/fused_moe.py Outdated Show resolved Hide resolved

pcmoritz reviewed Feb 5, 2024

View reviewed changes

csrc/moe/topk_softmax_kernels.cu Show resolved Hide resolved

pcmoritz approved these changes Feb 5, 2024

View reviewed changes

pcmoritz reviewed Feb 5, 2024

View reviewed changes

tests/kernels/test_moe.py Outdated Show resolved Hide resolved

WoosukKwon added 4 commits February 5, 2024 23:46

Address review on test_moe

a94dd8c

Fix docstring

fe8c108

Add assert statements

9a5d9d8

Merge branch 'main' into topk-softmax

7cd63e9

WoosukKwon merged commit f0d4e14 into main Feb 6, 2024

WoosukKwon deleted the topk-softmax branch February 6, 2024 01:38

casper-hansen mentioned this pull request Feb 11, 2024

MoE grouped gemm and fused topk_softmax casper-hansen/AutoAWQ_kernels#8

Merged

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

32d0b59

alexm-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Feb 13, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

66f6d9d

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

36017aa

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

2d2e6ee

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

73f4127

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Add fused top-K softmax kernel for MoE (vllm-project#2769)

19cecc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fused top-K softmax kernel for MoE#2769

Add fused top-K softmax kernel for MoE#2769
WoosukKwon merged 42 commits intomainfrom
topk-softmax

WoosukKwon commented Feb 5, 2024 •

edited

Loading

Uh oh!

WoosukKwon commented Feb 5, 2024

Uh oh!

pcmoritz commented Feb 5, 2024 •

edited

Loading

Uh oh!

WoosukKwon commented Feb 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcmoritz commented Feb 5, 2024 •

edited

Loading

Uh oh!

pcmoritz left a comment •

edited

Loading

Uh oh!

Uh oh!

WoosukKwon commented Feb 6, 2024

Uh oh!

pcmoritz commented Feb 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WoosukKwon commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon commented Feb 5, 2024

Uh oh!

pcmoritz commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon commented Feb 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcmoritz commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcmoritz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WoosukKwon commented Feb 6, 2024

Uh oh!

pcmoritz commented Feb 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WoosukKwon commented Feb 5, 2024 •

edited

Loading

pcmoritz commented Feb 5, 2024 •

edited

Loading

pcmoritz commented Feb 5, 2024 •

edited

Loading

pcmoritz left a comment •

edited

Loading