Temporary disable persistent topk for Hopper by zixi-qi · Pull Request #41605 · vllm-project/vllm

zixi-qi · 2026-05-04T04:41:52Z

Summary

Reverts commit a4debbd on releases/v0.20.1, which had reverted Temporary disable persistent topk #41442 ("Temporary disable persistent topk").
Restores the topk_tokens in (512, 1024, 2048) gate in vllm/model_executor/layers/sparse_attn_indexer.py so the persistent topk path is re-disabled for topk_tokens == 1024 on this release branch.

Test plan

Verified MTP=1 no longer hangs on hopper for deepseek v4 pro and gsm8k score as expected

✓ task=gsm8k | flexible-extract=0.9522 | strict-match=0.9530

AI assistance was used to prepare this revert PR (Claude Code). The change is a pure revert of a single-line gate; the submitter has reviewed every changed line.

Co-authored-by: Claude noreply@anthropic.com

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the sparse_attn_indexer function in vllm/model_executor/layers/sparse_attn_indexer.py to remove 1024 from the supported topk_tokens values when executing on CUDA platforms. I have no feedback to provide.

This reverts commit a4debbd. Signed-off-by: zixi-qi <zixi@inferact.ai>

Keep `topk_tokens == 1024` on the persistent_topk path on Blackwell (SM10x), but disable it on Hopper and other CUDA archs so the original revert (vllm-project#41442) behavior is preserved there. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: zixi-qi <zixi@inferact.ai>

zixi-qi · 2026-05-04T05:02:37Z

@claude review

claude

Straightforward release-branch backport of the existing Blackwell gating pattern from main (d92dd26); safe to land.

Extended reasoning...

Overview

This PR modifies a single conditional in vllm/model_executor/layers/sparse_attn_indexer.py (5 lines added, 1 removed) that gates whether the persistent topk kernel is used. After the change, topk_tokens == 1024 only takes the persistent path on Blackwell (is_device_capability_family(100)); Hopper and other non-Blackwell CUDA devices fall back to the per-row decode path. This is a backport of the same gating that already lives on main (commit d92dd26 "Gate persistent topk for topk_tokens=1024 to Blackwell only").

Security risks

None. The change is a pure dispatch gate based on device compute capability — no auth, no user-input handling, no crypto, no IO. Worst case is a perf regression on Blackwell or a missed hang fix on Hopper.

Level of scrutiny

Low. The PR is a release-branch revert-of-revert restoring an upstream-validated pattern. is_device_capability_family is verified to exist at vllm/platforms/interface.py:348 and correctly checks the major version of the compute capability (100 → 10.x). The author also reports concrete validation: MTP=1 no longer hangs on Hopper for DeepSeek v4 Pro, with gsm8k flexible-extract 0.9522 / strict-match 0.9530.

Other factors

The diff is small and self-contained, the bug hunting system found no issues, and the equivalent change has already been merged to main. The author has explicitly noted that AI assistance was used and they reviewed every changed line.

claude Bot reviewed May 4, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

zixi-qi and others added 2 commits May 4, 2026 04:46

Revert "Revert "Temporary disable persistent topk (vllm-project#41442)""

26cc501

This reverts commit a4debbd. Signed-off-by: zixi-qi <zixi@inferact.ai>

zixi-qi force-pushed the revert-persistent-topk-v0.20.1 branch from 1e042d7 to d92dd26 Compare May 4, 2026 04:46

zixi-qi changed the title ~~Revert "Revert \"Temporary disable persistent topk (#41442)\""~~ Temporary disable persistent topk for Hopper May 4, 2026

zixi-qi marked this pull request as draft May 4, 2026 04:52

zixi-qi marked this pull request as ready for review May 4, 2026 05:02

zixi-qi requested review from ywang96 and zyongye May 4, 2026 05:03

claude Bot reviewed May 4, 2026

View reviewed changes

ywang96 merged commit 43a21e6 into vllm-project:releases/v0.20.1 May 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Temporary disable persistent topk for Hopper#41605

Temporary disable persistent topk for Hopper#41605
ywang96 merged 2 commits intovllm-project:releases/v0.20.1from
zixi-qi:revert-persistent-topk-v0.20.1

zixi-qi commented May 4, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

zixi-qi commented May 4, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zixi-qi commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

zixi-qi commented May 4, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zixi-qi commented May 4, 2026 •

edited

Loading