[ROCm][Perf] Enabled FP4Indexer for DSV4 by tjtanaa · Pull Request #42908 · vllm-project/vllm

tjtanaa · 2026-05-18T01:04:16Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

gemini-code-assist

Code Review

This pull request introduces ROCm support for MXFP4 quantization within the DeepSeek V4 sparse indexer, adding specialized Triton kernels for paged MQA logits and implementing optimizations for trivial top-k scenarios. Feedback from the review identified high-severity issues in the new FP4 MQA kernels, specifically regarding shape mismatches in tl.dot_scaled operations that necessitate transposing the RHS scale tensor.

gemini-code-assist · 2026-05-18T01:09:48Z

+    scores = tl.dot_scaled(
+        q_packed,
+        q_scale,
+        "e2m1",
+        k_packed,
+        k_scale,
+        "e2m1",
+        lhs_k_pack=True,
+        rhs_k_pack=True,
+        out_dtype=tl.float32,
+    )


The tl.dot_scaled operation expects the RHS scale tensor to have a shape of (K_scaled, N) when rhs_k_pack=True. In this kernel, k_scale is loaded with shape (BLOCK_KV, 4), which corresponds to (N, K_scaled). This mismatch will likely lead to incorrect results or compilation errors. You should transpose k_scale before passing it to tl.dot_scaled.

Suggested change

scores = tl.dot_scaled(

q_packed,

q_scale,

"e2m1",

k_packed,

k_scale,

"e2m1",

lhs_k_pack=True,

rhs_k_pack=True,

out_dtype=tl.float32,

)

scores = tl.dot_scaled(

q_packed,

q_scale,

"e2m1",

k_packed,

tl.trans(k_scale),

"e2m1",

lhs_k_pack=True,

rhs_k_pack=True,

out_dtype=tl.float32,

)

gemini-code-assist · 2026-05-18T01:09:48Z

+        scores = tl.dot_scaled(
+            q_packed,
+            q_scale,
+            "e2m1",
+            k_packed,
+            k_scale,
+            "e2m1",
+            lhs_k_pack=True,
+            rhs_k_pack=True,
+            out_dtype=tl.float32,
+        )


Similar to the paged kernel, tl.dot_scaled here expects the RHS scale to be (K_scaled, N). Since k_scale is loaded as (BLOCK_KV, 4), it needs to be transposed to match the expected (4, BLOCK_KV) shape.

Suggested change

scores = tl.dot_scaled(

q_packed,

q_scale,

"e2m1",

k_packed,

k_scale,

"e2m1",

lhs_k_pack=True,

rhs_k_pack=True,

out_dtype=tl.float32,

)

scores = tl.dot_scaled(

q_packed,

q_scale,

"e2m1",

k_packed,

tl.trans(k_scale),

"e2m1",

lhs_k_pack=True,

rhs_k_pack=True,

out_dtype=tl.float32,

)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

mergify · 2026-05-23T10:31:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tjtanaa.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tjtanaa added 8 commits May 15, 2026 21:56

fix deepseek v4 high concurrency issue

089ebd4

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

sync with main

67128ce

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

num_padded_tokens

08f73d3

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

resolve merge conflict

23d33f1

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

fix precommit

bd0004f

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

add v1 support of fp4indexer

c0219ec

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

optimized fp4indexer

fc065f3

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

resolve merge conflict

8a3e4cd

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

mergify Bot added rocm Related to AMD ROCm v1 labels May 18, 2026

github-project-automation Bot added this to AMD May 18, 2026

github-project-automation Bot moved this to Todo in AMD May 18, 2026

cleanup

f557e35

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

tjtanaa added 3 commits May 18, 2026 09:50

remove indices

5288358

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

update comment

99fd8f9

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

remove the use of torpk torch

490a7da

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

tjtanaa mentioned this pull request May 22, 2026

[Performance]: Deepseek-V4 Support and Optimization on ROCm Backend #41820

Open

22 tasks

maeehart mentioned this pull request May 22, 2026

[ROCm][DSv4] Functional fixes for DeepSeek V4 on MI300X (gfx942) #42893

Draft

mergify Bot added the needs-rebase label May 23, 2026

Fangzhou-Ai mentioned this pull request May 26, 2026

[ROCm][DeepSeek-V4] WIP: Enable CSA multistream decode #43718

Draft

github-actions Bot mentioned this pull request Jun 1, 2026

EleutherAI Alpha Digest — 2026-06-01 (2414 msgs) jaisong123/eleuther-digest#334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Perf] Enabled FP4Indexer for DSV4#42908

[ROCm][Perf] Enabled FP4Indexer for DSV4#42908
tjtanaa wants to merge 12 commits into
vllm-project:mainfrom
EmbeddedLLM:dsv4fp4indexer

tjtanaa commented May 18, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tjtanaa commented May 18, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tjtanaa commented May 18, 2026 •

edited by github-actions Bot

Loading