[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant#23929
[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant#23929simon-mo merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: hongchao <hongchao@msh.team>
There was a problem hiding this comment.
Code Review
This pull request correctly identifies the need to restrict the compilation and usage of silu_and_mul_nvfp4_quant to specific CUDA architectures that support NVFP4. The changes in the C++ files appear to implement this correctly using preprocessor directives. However, there is a critical issue in the corresponding Python check within vllm/compilation/fix_functionalization.py. The device capability check uses an incorrect value, which will prevent this optimized kernel from being used on the intended hardware (e.g., Blackwell GPUs).
| elif current_platform.has_device_capability( | ||
| 100 | ||
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: |
There was a problem hiding this comment.
The check current_platform.has_device_capability(100) is incorrect. The has_device_capability method compares its integer argument with the major version of the CUDA compute capability. For Blackwell GPUs (SM 10.0), the major version is 10. Therefore, the check 10 >= 100 will always evaluate to false, preventing this code path from ever being taken on the intended hardware. This should be has_device_capability(10) to correctly target devices with compute capability 10.x and above.
| elif current_platform.has_device_capability( | |
| 100 | |
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: | |
| elif current_platform.has_device_capability( | |
| 10 | |
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: |
There was a problem hiding this comment.
Easier way to check here is "if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant)"
There was a problem hiding this comment.
i feel if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant) makes more sense.
|
relative issue: #23916 |
|
@mgoin can you help take a look? |
|
I confirmed that this PR fixes the issue for me |
|
I already have a fix in #23727. Like other nvfp4 kernels, I added the function definition in csrc/quantization/fp4/nvfp4_quant_entry.cu so the function will alway be defined in general code path. This should fix this issue in a cleaner way. |
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Fix undefined silu_and_mul_nvfp4_quant on H or A cuda devices.
