[Perf] Replace cudaMemsetAsync with in-kernel cleanup for persistent_topk by LopezCastroRoberto · Pull Request #41748 · vllm-project/vllm

LopezCastroRoberto · 2026-05-05T16:29:06Z

Motivation

Replaces the per-call cudaMemsetAsync (PR #41444 and #41665) with an in-kernel cleanup at the end of persistent_topk_kernel, eliminating a 3-5us overhead per launch (decode step).

This is particularly relevant in low-latency scenarios and when max_seq_len <= RADIX_THRESHOLD, where the workspace is not required. Despite this, the persistent kernel currently invokes cudaMemsetAsync unconditionally, incurring unnecessary overhead.

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-05-05T16:30:37Z

Hi @LopezCastroRoberto, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request moves the zero-initialization of the RadixRowState workspace from a host-side cudaMemsetAsync to an in-kernel cleanup at the end of the persistent TopK kernel. This change aims to improve performance and support CUDA graph replays by avoiding stream-ordered host calls. However, the current in-kernel implementation introduces a performance bottleneck due to excessive memory fences in a loop and a race condition where the arrival_counter could be reset before all histogram zeros are globally visible. I have provided feedback on how to parallelize the zeroing and use a proper synchronization barrier.

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto · 2026-05-05T17:56:39Z

@claude review

LopezCastroRoberto · 2026-05-05T18:43:29Z

cc: @zyongye

fix two kernel launch

2318b84

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto requested review from WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners May 5, 2026 16:29

claude Bot reviewed May 5, 2026

View reviewed changes

mergify Bot added the nvidia label May 5, 2026

github-project-automation Bot added this to NVIDIA May 5, 2026

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Comment thread csrc/persistent_topk.cuh

LopezCastroRoberto marked this pull request as draft May 5, 2026 16:39

Merge branch 'main' into perf/inline_init_persistent_topK

a71e56e

LopezCastroRoberto marked this pull request as ready for review May 5, 2026 16:44

LopezCastroRoberto marked this pull request as draft May 5, 2026 16:55

update

b5a8b98

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto marked this pull request as ready for review May 5, 2026 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Replace cudaMemsetAsync with in-kernel cleanup for persistent_topk#41748

[Perf] Replace cudaMemsetAsync with in-kernel cleanup for persistent_topk#41748
LopezCastroRoberto wants to merge 3 commits intovllm-project:mainfrom
LopezCastroRoberto:perf/inline_init_persistent_topK

LopezCastroRoberto commented May 5, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

LopezCastroRoberto commented May 5, 2026

Uh oh!

LopezCastroRoberto commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LopezCastroRoberto commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

LopezCastroRoberto commented May 5, 2026

Uh oh!

LopezCastroRoberto commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LopezCastroRoberto commented May 5, 2026 •

edited

Loading