[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files#23727
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors common CUDA kernel utilities for nvfp4 quantization into a new header file, csrc/quantization/fp4/nvfp4_utils.cuh. This is a good change that improves code reuse and maintainability. My review focuses on the new utility file, and I've identified a few areas for improvement to make the utilities safer for future use. Specifically, several functions designed for newer hardware architectures (SM100+) have fallback paths for older architectures that fail silently by returning 0 or nullptr. This could lead to hard-to-debug issues if these utilities are reused without the proper architecture guards. I've suggested using static_assert to cause a compile-time error in these cases, making any misuse immediately obvious.
There was a problem hiding this comment.
@pavanimajety @ProExpertProg is asking for removing these cuda sm100 macros for these nvfp4 kernels. Do you think this is safe to remove? Thanks
There was a problem hiding this comment.
If we want to be conservative we can add an #error macro if the CUDA arch is too low
There was a problem hiding this comment.
I think it should be safe. Removed them.
There was a problem hiding this comment.
Sorry for the delayed reply. As long as these files are not built for lower unsupported architectures, we should be okay to remove it. We may also have to revisit these compilations for SM120+ builds
7465d77 to
29dbc72
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
29dbc72 to
7794ce5
Compare
ProExpertProg
left a comment
There was a problem hiding this comment.
Looks good just left a static_cast nit!
7794ce5 to
7eca643
Compare
|
Can you make sure to also revert #23959? |
ProExpertProg
left a comment
There was a problem hiding this comment.
Another reinterpret_cast, will just fix directly
|
I made #24121 as a workaround, if this lands first I'll close it! |
2cb160e to
e897656
Compare
|
#24121 has been merged, so please revert those before merging this PR. Thanks! |
e897656 to
e52bbe9
Compare
7979552 to
ec829a6
Compare
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
This reverts commit e66ed3e. Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
…ject#24121)" This reverts commit 2fd1a40. Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
This reverts commit 41be13dcbb179cc6fa78bf176069858dbe6087e9. Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Revert "fix pre-commit" This reverts commit ebac9cd0f6455a9f3580454a0b48ebaf95b1f4bc. Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
ec829a6 to
ed01518
Compare
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
…utils for nvfp4 kernel source files (vllm-project#23727) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
…utils for nvfp4 kernel source files (vllm-project#23727) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
…utils for nvfp4 kernel source files (vllm-project#23727) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
…ithout nvfp4 Fixes the undefined symbol issue (vllm-project#23925 (comment)) by keeping the function definition of `cutlass_fp4_group_mm` even on non-nvfp4-supported platforms in `csrc/quantization/fp4/nvfp4_blockwise_moe_entry.cu`. Similar to vllm-project#23727.
Purpose
csrc/quantization/fp4/nvfp4_quant_entry.cuVLLM_DISPATCH_HALF_TYPESfor thenvfp4_quant_kernels.cuandnvfp4_experts_quant.cuTest Plan && Test Result
tests/kernels/quantization/test_silu_nvfp4_quant_fusion.py==== 8 passed in 2.49s ===tests/kernels/quantization/test_nvfp4_quant.py====== 18 passed in 5.58s =========tests/kernels/moe/test_nvfp4_moe.py======= 180 passed in 30.85s ======Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.