[Sampler] Support returning all logprobs or logits#21792
[Sampler] Support returning all logprobs or logits#21792vllm-bot merged 8 commits intovllm-project:mainfrom
Conversation
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces support for returning all logprobs or logits by setting logprobs=-1. The changes are well-contained and include updates to configuration, validation, and worker logic, along with a new test case.
My review identified a critical bug in the validation logic. The current implementation would incorrectly allow a request with logprobs=-1 even when the engine has a max_logprobs limit set, which could lead to unexpected resource consumption. I've provided a code suggestion to fix this validation logic. The rest of the implementation appears correct and effectively adds the desired functionality.
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
cc @njhill |
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
houseroad
left a comment
There was a problem hiding this comment.
why do we want to return all the logits? This can easily run into OOM, right?
njhill
left a comment
There was a problem hiding this comment.
This looks reasonable to me but would be good to get some more opinions.
why do we want to return all the logits? This can easily run into OOM, right?
@houseroad yes I think it could cause significant performance problems in a multi-user production scenario but imo it's ok to support this for experimental purposes given that it needs to be explicitly enabled via the config.
|
yea this is not for production but nice to provide an option for power users to get everything they want (e.g. for debugging). |
houseroad
left a comment
There was a problem hiding this comment.
Sure, I am fine with opt-in solution then. :-)
vllm/config.py
Outdated
| """Maximum number of log probabilities to return when `logprobs` is | ||
| specified in `SamplingParams`. The default value comes the default for the | ||
| OpenAI Chat Completions API.""" | ||
| OpenAI Chat Completions API. -1 means no cap.""" |
There was a problem hiding this comment.
Let's add a comment to explicitly call out that -1 is likely causing OOM?
There was a problem hiding this comment.
sounds good, updated
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Often for numerical benchmark and debugging purpose, we need to get all the logits or logprobs (controlled by
logprobs_mode). This PR supports returning all of them, of vocab_size length, ifSampleingParams.logprobs=-1andModelConfig.max_logprobs=-1Test Plan
Test Result
Test passed
(Optional) Documentation Update