[Bugfix] Decode prompt text from token IDs upstream in renderer#37380
[Bugfix] Decode prompt text from token IDs upstream in renderer#37380karanb192 wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request fixes an issue where the prompt was logged as None for token-only prompts by decoding the token IDs back to text. The implementation is correct, but my feedback focuses on improving error handling within the new logic. Specifically, I suggest logging a warning instead of silently ignoring exceptions during the decoding process to improve debuggability.
| except Exception: | ||
| pass |
There was a problem hiding this comment.
While catching a broad Exception is acceptable here to prevent crashing the logging logic, silently passing with pass can hide underlying issues with the tokenizer or the decoding process. It would be more informative to log a warning when an exception occurs. This will help in debugging cases where the prompt text is unexpectedly None in the logs.
except Exception as e:
logger.warning(
"Failed to decode token IDs for request %s to log prompt text. "
"Error: %s", request_id, e)|
cc @qandrew @bbrowning for Harmony encoding |
|
Hi @karanb192, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
qandrew
left a comment
There was a problem hiding this comment.
Hi thanks for the fix!
I think the issue here should be addressed more upstream, as I had noticed something similar a while ago for debugging in responsesAPI in addition to chatCompletions.
https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L651
would you mind changing those lines to this and see if it works?
if prompt_text := prompt.get("prompt"):
inputs["prompt"] = prompt_text
elif (tokenizer := self.get_tokenizer()) is not None:
inputs["prompt"] = tokenizer.decode(prompt_token_ids)
if cache_salt := prompt.get("cache_salt"):
inputs["cache_salt"] = cache_salt
|
Actually, that code is on purpose, since users might pass token inputs directly and we want to avoid unnecessary detokenization. You can set |
Move the prompt-text decode fix upstream into BaseRenderer._process_tokens, as suggested by @qandrew in PR review. When models like gpt-oss-20b or Mistral tokenizer models render chat prompts directly to token IDs, the engine prompt dict only contains prompt_token_ids without a text prompt field. This caused "prompt: None" in RequestLogger debug output and left the prompt text unavailable for any downstream consumer. Fix by decoding prompt_token_ids back to text in _process_tokens when the prompt text is not already present and a tokenizer is available. This ensures the prompt text is populated in engine inputs for all consumers (logging, responses API, etc.), not just the debug logger. When skip_tokenizer_init=True, self.tokenizer is None, so the decode is safely skipped. Fixes vllm-project#37253 Signed-off-by: karanb192 <karan@example.com>
ah i see as it was introduced here #32863. then maybe we don't need code changes and just need to set the TokenizeParams accordingly? |
It would be even better to keep the text in this code so we don't need to detokenize |
Just a note that the user's input text and the text we get from decoding the token ids will be very different. The former is just whatever the user passed as content of messages, while the latter will be the actual textual representation of the Harmony formatted tokens that get passed to the model. The latter is useful for debugging, but unnecessary and potentially performance-impacting if we do that unconditionally for every inference request as it requires an otherwise unnecessary tokenizer.decode call. |
|
i played around a bit with a debugger looking at the code, with Qwen3 when we hit preprocess_chat via chatCompletions, on this line prompt already includes the debug string that we'd want to later show https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L497 so for this case we should just be able to extract that out and show the debug log to the user directly. @karanb192 maybe GPT-OSS doesn't go from chatCompletion conversations -> prompt str -> token Ids (like in qwen3), but chatCompletion conversations -> token ids directly? 🤷 |
Summary
prompt_token_idswithout a textpromptfield. This causedprompt: Nonein debug logs and left the prompt text unavailable for any downstream consumer (logging, responses API, etc.).BaseRenderer._process_tokensso thatinputs["prompt"]is always populated when token IDs are available. When the tokenizer is not initialized (skip_tokenizer_init=True),self.tokenizerisNoneand the decode is safely skipped.OpenAIServing._log_inputs, which only fixed the logging symptom. The upstream fix ensures prompt text is available for all consumers.Fixes #37253
Test plan
Noneelifis not reached)tokenizer.decode()is only called whenprompttext is not already present in theTokensPromptskip_tokenizer_init=True, the decode is safely skipped (self.tokenizer is None)