[WIP][Perf] Add FlashInfer CuTeDSL backend for NVFP4 GEMM on Blackwell by LopezCastroRoberto · Pull Request #39933 · vllm-project/vllm

LopezCastroRoberto · 2026-04-15T18:03:51Z

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

gemini-code-assist

Code Review

This pull request updates flashinfer-python to version 0.6.7 and introduces the cute-dsl backend for NVFP4 quantization. The changes include updates to the backend enum, weight preparation logic, and kernel tests. A critical feedback point notes that bypassing the flashinfer_mm_fp4 custom operator in favor of a direct library call will likely break CUDA graph capture; it is recommended to update the custom operator to accept the new parameters instead.

gemini-code-assist · 2026-04-15T18:07:37Z

+    from flashinfer import mm_fp4 as _flashinfer_mm_fp4
+
+    return _flashinfer_mm_fp4(
        a,
        b.t(),
        block_scale_a,
        block_scale_b.t(),
        alpha,
        out_dtype,
+        block_size=16,
        use_8x4_sf_layout=use_8x4_sf_layout,
        backend=backend,
+        use_nvfp4=True,
    )


Bypassing the vllm::flashinfer_mm_fp4 custom op by calling flashinfer.mm_fp4 directly will likely break CUDA graph capture, which is a key performance feature in vLLM. This can lead to performance regressions.

Instead of bypassing the custom op, please update its definition (and its fake implementation) to accept the use_nvfp4 parameter and pass it to the underlying flashinfer.mm_fp4 call. The custom op is defined in this same file, so it should be straightforward to modify.

After updating the custom op, you can call it from here like this:

return flashinfer_mm_fp4( a, b.t(), block_scale_a, block_scale_b.t(), alpha, out_dtype, use_8x4_sf_layout=use_8x4_sf_layout, backend=backend, use_nvfp4=True, )

mergify · 2026-04-15T18:58:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LopezCastroRoberto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-05-18T17:12:24Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LopezCastroRoberto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

init

6166473

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto requested review from WoosukKwon, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners April 15, 2026 18:03

LopezCastroRoberto marked this pull request as draft April 15, 2026 18:03

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

mergify Bot added ci/build nvidia labels Apr 15, 2026

github-project-automation Bot added this to NVIDIA Apr 15, 2026

mergify Bot added the needs-rebase label Apr 15, 2026

LopezCastroRoberto changed the title ~~[Perf] Add FlashInfer CuTeDSL backend for NVFP4 GEMM on Blackwell~~ [WIP][Perf] Add FlashInfer CuTeDSL backend for NVFP4 GEMM on Blackwell Apr 17, 2026

LopezCastroRoberto mentioned this pull request May 13, 2026

[Kernel][Performance] Add FlashInfer cutedsl NVFP4 GEMM backend #42235

Open

mergify Bot removed the needs-rebase label May 18, 2026

mergify Bot added the needs-rebase label May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][Perf] Add FlashInfer CuTeDSL backend for NVFP4 GEMM on Blackwell#39933

[WIP][Perf] Add FlashInfer CuTeDSL backend for NVFP4 GEMM on Blackwell#39933
LopezCastroRoberto wants to merge 1 commit into
vllm-project:mainfrom
LopezCastroRoberto:perf/fp4_cute-dsl

LopezCastroRoberto commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

mergify Bot commented Apr 15, 2026

Uh oh!

mergify Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LopezCastroRoberto commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 15, 2026

Uh oh!

mergify Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant