Skip to content

[bugfix] fix bug when top_logprobs=0 with spec decoding#30059

Merged
njhill merged 6 commits intovllm-project:mainfrom
realliujiaxu:fix-top-logprobs-0
Dec 12, 2025
Merged

[bugfix] fix bug when top_logprobs=0 with spec decoding#30059
njhill merged 6 commits intovllm-project:mainfrom
realliujiaxu:fix-top-logprobs-0

Conversation

@realliujiaxu
Copy link
Contributor

@realliujiaxu realliujiaxu commented Dec 4, 2025

Purpose

#26060 adds support for returning logprobs for v1 spec decoding. However, if top_logprobs is not set or set to 0, an error will occur:

{"error":{"message":"list index out of range","type":"Internal Server Error","param":null,"code":500}}

The reason is that the rejection sampler returns logprobs as None. The fix is to follow the sampler's approach: even when top_logprobs=0, logprobs should still be collected.

Test Plan

Run server

vllm serve Qwen/Qwen3-8B \
  --speculative-config '{
    "model": "RedHatAI/Qwen3-8B-speculator.eagle3",
    "num_speculative_tokens": 3,
    "method": "eagle3"
  }'

test with top_logprobs=0

curl --location 'http://127.0.0.1:8000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
    "temperature": 0,
    "max_tokens": 10,
    "messages": [
        {
        "role": "user",
        "content": "who are you"
        }
    ],
    "logprobs": true,
    "top_logprobs": 0
    }'

Test Result

{"id":"chatcmpl-961ba782ebf17147","object":"chat.completion","created":1764852213,"model":"Qwen/Qwen3-8B","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\nOkay, the user asked \"who are","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":{"content":[{"token":"<think>","logprob":-2.3841855067985307e-07,"bytes":[60,116,104,105,110,107,62],"top_logprobs":[]},{"token":"\n","logprob":-1.0132738680113107e-05,"bytes":[10],"top_logprobs":[]},{"token":"Okay","logprob":-0.0039456626400351524,"bytes":[79,107,97,121],"top_logprobs":[]},{"token":",","logprob":-6.270212179515511e-05,"bytes":[44],"top_logprobs":[]},{"token":" the","logprob":-0.017913110554218292,"bytes":[32,116,104,101],"top_logprobs":[]},{"token":" user","logprob":-1.645074735279195e-05,"bytes":[32,117,115,101,114],"top_logprobs":[]},{"token":" asked","logprob":-0.19733361899852753,"bytes":[32,97,115,107,101,100],"top_logprobs":[]},{"token":" \"","logprob":-0.578165590763092,"bytes":[32,34],"top_logprobs":[]},{"token":"who","logprob":-0.0009124883217737079,"bytes":[119,104,111],"top_logprobs":[]},{"token":" are","logprob":-4.768360213347478e-06,"bytes":[32,97,114,101],"top_logprobs":[]}]},"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":21,"completion_tokens":10,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a bug in speculative decoding where setting top_logprobs=0 would cause a server error. The fix, which changes the condition for checking sampling_metadata.max_num_logprobs from a truthiness check to an explicit is not None check, is correct and idiomatic. This ensures that logprobs are computed when top_logprobs is 0, aligning the rejection sampler's behavior with the standard sampler and preventing the reported server error. The change is minimal and precisely targets the bug.

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@njhill
Copy link
Member

njhill commented Dec 4, 2025

Thank you @realliujiaxu! Would you be willing to add a simple test for this?

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@realliujiaxu
Copy link
Contributor Author

realliujiaxu commented Dec 5, 2025

Thank you @realliujiaxu! Would you be willing to add a simple test for this?

Done. I've added a simple test, and tested it locally. Thanks! @njhill

Before the fix, the newly added unit test failed:

======================================================== short test summary info ========================================================
FAILED tests/v1/sample/test_logprobs.py::test_spec_decode_logprobs[model_setup0-raw_logits] - AssertionError: assert 10 == 0
FAILED tests/v1/sample/test_logprobs.py::test_spec_decode_logprobs[model_setup0-raw_logprobs] - AssertionError: assert 10 == 0
FAILED tests/v1/sample/test_logprobs.py::test_spec_decode_logprobs[model_setup0-processed_logits] - AssertionError: assert 10 == 0
FAILED tests/v1/sample/test_logprobs.py::test_spec_decode_logprobs[model_setup0-processed_logprobs] - AssertionError: assert 10 == 0

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@realliujiaxu realliujiaxu requested a review from njhill December 9, 2025 07:17
@realliujiaxu
Copy link
Contributor Author

@njhill Can we merge this PR?

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025
@njhill njhill enabled auto-merge (squash) December 9, 2025 17:22
@njhill
Copy link
Member

njhill commented Dec 10, 2025

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
auto-merge was automatically disabled December 10, 2025 03:39

Head branch was pushed to by a user without write access

@realliujiaxu
Copy link
Contributor Author

@njhill the test failure if fixed, please review again

@njhill
Copy link
Member

njhill commented Dec 10, 2025

Thanks @realliujiaxu ... re the test fix, it looks like you changed max_num_logprobs from 0 to None in the test itself. But it seems that having it set to 0 caused this error in the rejection sampler which seems wrong?

[2025-12-09T18:17:38Z]                 logits,
[2025-12-09T18:17:38Z]                 target_logits if self.is_processed_logprobs_mode else raw_target_logits,
[2025-12-09T18:17:38Z] >               bonus_sampler_output.logprobs_tensors.logprobs,
[2025-12-09T18:17:38Z]                 output_token_ids,
[2025-12-09T18:17:38Z]             )
[2025-12-09T18:17:38Z] E           AttributeError: 'NoneType' object has no attribute 'logprobs'

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just blocking until answer to prior question is understood!

@realliujiaxu
Copy link
Contributor Author

Thanks @realliujiaxu ... re the test fix, it looks like you changed from to in the test itself. But it seems that having it set to caused this error in the rejection sampler which seems wrong?max_num_logprobs``0``None``0

[2025-12-09T18:17:38Z]                 logits,
[2025-12-09T18:17:38Z]                 target_logits if self.is_processed_logprobs_mode else raw_target_logits,
[2025-12-09T18:17:38Z] >               bonus_sampler_output.logprobs_tensors.logprobs,
[2025-12-09T18:17:38Z]                 output_token_ids,
[2025-12-09T18:17:38Z]             )
[2025-12-09T18:17:38Z] E           AttributeError: 'NoneType' object has no attribute 'logprobs'

(If I understand correctly, you are wondering why I changed the test and if havingmax_num_logprobs set to 0 still caused error in the rejection sampler.)

bonus_sampler_output.logprobs_tensors is None because mock_sampler_output is used to stub the return value of rejection_sampler.sampler.return_value, where logprobs_tensors is hardcoded as None.

def mock_sampler_output(
    rejection_sampler: RejectionSampler, bonus_token_ids: torch.Tensor
):
    rejection_sampler.sampler.return_value = SamplerOutput(
        sampled_token_ids=bonus_token_ids, logprobs_tensors=None
    )

If the actual sampler were executed, logprobs_tensors would not be None. Therefore, the rejection sampler code itself is not at fault—the test case should be modified instead. To make mock_sampler_output consistent with the actual execution result, there are two ways to modify the test:

  1. (The approach I’m currently taking) Set max_num_logprobs to None – the actual sampler execution will also produce logprobs_tensors = None.
  2. Modify mock_sampler_output by preparing a fake value for logprobs_tensors, following the same logic as the sampler.

Perhaps option 2 is more reasonable? Looking forward to your further advice. @njhill

@njhill
Copy link
Member

njhill commented Dec 11, 2025

Thanks @realliujiaxu, makes sense, I'm happy with whichever you think is best.

@realliujiaxu
Copy link
Contributor Author

@njhill I think the current approach is simple and clear enough. Can we merge this PR?

@njhill njhill merged commit d2c919d into vllm-project:main Dec 12, 2025
44 checks passed
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…#30059)

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…#30059)

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants