Add cutlass support for blackwell fp8 blockwise gemm#14383
Add cutlass support for blackwell fp8 blockwise gemm#14383simon-mo merged 7 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
It looks like I'm curious to compare it to the blockwise kernels I wrote in the blackwell-rebase-feb20 branch of deepinfra/vllm , since we had to change the scale factor to be fp8 |
2f5591d to
546b495
Compare
|
Thank you @wenscarl! Please correct the |
3a34259 to
07e8083
Compare
|
@tylertitsworth Can you please take a look? |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Thanks for the contribution! A couple of comments but looks good overall
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8.cu
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
|
This pull request has merge conflicts that must be resolved before it can be |
287cfb3 to
ca3a3e2
Compare
ca3a3e2 to
87d109d
Compare
a06d3da to
3dfa546
Compare
|
Looks good to me now, thank you! Please merge in latest main to fix the |
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
|
This pull request has merge conflicts that must be resolved before it can be |
703ec2c to
1d17dd1
Compare
1d17dd1 to
92d6da8
Compare
92d6da8 to
86d58fd
Compare
7ca702c to
f207fec
Compare
LucasWilkinson
left a comment
There was a problem hiding this comment.
LGTM now, thanks!
Head branch was pushed to by a user without write access
f207fec to
cc21ba5
Compare
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
cc21ba5 to
b6783db
Compare
This PR adds support for cutlass blackwell blockwise gemm for fp8.