Skip to content

[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files#23727

Merged
simon-mo merged 10 commits intovllm-project:mainfrom
elvischenv:elvischenv/nvfp4-utils
Sep 4, 2025
Merged

[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files#23727
simon-mo merged 10 commits intovllm-project:mainfrom
elvischenv:elvischenv/nvfp4-utils

Conversation

@elvischenv
Copy link
Contributor

@elvischenv elvischenv commented Aug 27, 2025

Purpose

  • Fixed the undefined symbol issue by keeping the function definition even on non-nvfp4-supported platforms in csrc/quantization/fp4/nvfp4_quant_entry.cu
  • Extracted common utils for nvfp4 kernel source files.
  • Removed unnecessary CUDA_ARCH checking for the nvfp4 kernel.
  • Switched to use VLLM_DISPATCH_HALF_TYPES for the nvfp4_quant_kernels.cu and nvfp4_experts_quant.cu

Test Plan && Test Result

tests/kernels/quantization/test_silu_nvfp4_quant_fusion.py
==== 8 passed in 2.49s ===
tests/kernels/quantization/test_nvfp4_quant.py
====== 18 passed in 5.58s =========
tests/kernels/moe/test_nvfp4_moe.py
======= 180 passed in 30.85s ======


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors common CUDA kernel utilities for nvfp4 quantization into a new header file, csrc/quantization/fp4/nvfp4_utils.cuh. This is a good change that improves code reuse and maintainability. My review focuses on the new utility file, and I've identified a few areas for improvement to make the utilities safer for future use. Specifically, several functions designed for newer hardware architectures (SM100+) have fallback paths for older architectures that fail silently by returning 0 or nullptr. This could lead to hard-to-debug issues if these utilities are reused without the proper architecture guards. I've suggested using static_assert to cause a compile-time error in these cases, making any misuse immediately obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavanimajety @ProExpertProg is asking for removing these cuda sm100 macros for these nvfp4 kernels. Do you think this is safe to remove? Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be conservative we can add an #error macro if the CUDA arch is too low

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be safe. Removed them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed reply. As long as these files are not built for lower unsupported architectures, we should be okay to remove it. We may also have to revisit these compilations for SM120+ builds

@elvischenv elvischenv force-pushed the elvischenv/nvfp4-utils branch 3 times, most recently from 7465d77 to 29dbc72 Compare August 29, 2025 16:35
@mergify
Copy link

mergify bot commented Aug 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @elvischenv.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just left a static_cast nit!

@elvischenv elvischenv force-pushed the elvischenv/nvfp4-utils branch from 7794ce5 to 7eca643 Compare September 2, 2025 14:12
@ProExpertProg
Copy link
Collaborator

Can you make sure to also revert #23959?

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reinterpret_cast, will just fix directly

@MatthewBonanni
Copy link
Collaborator

MatthewBonanni commented Sep 2, 2025

I made #24121 as a workaround, if this lands first I'll close it!

@MatthewBonanni
Copy link
Collaborator

#24121 has been merged, so please revert those before merging this PR. Thanks!

@elvischenv elvischenv force-pushed the elvischenv/nvfp4-utils branch from e897656 to e52bbe9 Compare September 3, 2025 00:48
@mergify mergify bot added the ci/build label Sep 3, 2025
@elvischenv elvischenv force-pushed the elvischenv/nvfp4-utils branch 2 times, most recently from 7979552 to ec829a6 Compare September 3, 2025 00:53
@elvischenv elvischenv changed the title [Misc] Extract common utils for nvfp4 kernel source files [Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files Sep 3, 2025
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
elvischenv and others added 7 commits September 3, 2025 05:33
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
This reverts commit e66ed3e.

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
…ject#24121)"

This reverts commit 2fd1a40.

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
This reverts commit 41be13dcbb179cc6fa78bf176069858dbe6087e9.

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

Revert "fix pre-commit"

This reverts commit ebac9cd0f6455a9f3580454a0b48ebaf95b1f4bc.

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/nvfp4-utils branch from ec829a6 to ed01518 Compare September 3, 2025 12:33
@ProExpertProg ProExpertProg enabled auto-merge (squash) September 3, 2025 15:07
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@simon-mo simon-mo disabled auto-merge September 4, 2025 21:25
@simon-mo simon-mo merged commit adc3ddb into vllm-project:main Sep 4, 2025
68 of 71 checks passed
@elvischenv elvischenv deleted the elvischenv/nvfp4-utils branch September 5, 2025 01:35
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…utils for nvfp4 kernel source files (vllm-project#23727)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…utils for nvfp4 kernel source files (vllm-project#23727)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…utils for nvfp4 kernel source files (vllm-project#23727)

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
StrongerXi added a commit to StrongerXi/vllm that referenced this pull request Oct 2, 2025
…ithout nvfp4

Fixes the undefined symbol issue
(vllm-project#23925 (comment))
by keeping the function definition of `cutlass_fp4_group_mm` even on
non-nvfp4-supported platforms in `csrc/quantization/fp4/nvfp4_blockwise_moe_entry.cu`.

Similar to vllm-project#23727.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: _C.abi3.so: undefined symbol: _Z24silu_and_mul_nvfp4_quantRN2at6TensorES1_S1_S1_

5 participants