[Bugfix] Decode prompt text from token IDs upstream in renderer by karanb192 · Pull Request #37380 · vllm-project/vllm

karanb192 · 2026-03-18T04:16:13Z

Summary

When using models like gpt-oss-20b via the Chat Completions API, the prompt is rendered directly to token IDs (via Harmony encoding), so the engine prompt only contains prompt_token_ids without a text prompt field. This caused prompt: None in debug logs and left the prompt text unavailable for any downstream consumer (logging, responses API, etc.).
Fix (per @qandrew's review suggestion): move the decode upstream into BaseRenderer._process_tokens so that inputs["prompt"] is always populated when token IDs are available. When the tokenizer is not initialized (skip_tokenizer_init=True), self.tokenizer is None and the decode is safely skipped.
This replaces the previous approach of decoding in OpenAIServing._log_inputs, which only fixed the logging symptom. The upstream fix ensures prompt text is available for all consumers.
Also fixes the same issue for Mistral tokenizer models whose chat templates return token IDs directly.

Test plan

Verify with a model that uses Harmony rendering (e.g., gpt-oss-20b) that the debug log now shows the decoded prompt text instead of None
Verify with a standard HF model (e.g., Gemma-3) that behavior is unchanged (prompt text was already available, so the elif is not reached)
Verify with Mistral tokenizer models that the decoded prompt now appears in logs
Confirm no performance regression — tokenizer.decode() is only called when prompt text is not already present in the TokensPrompt
Verify that when skip_tokenizer_init=True, the decode is safely skipped (self.tokenizer is None)

gemini-code-assist

Code Review

This pull request fixes an issue where the prompt was logged as None for token-only prompts by decoding the token IDs back to text. The implementation is correct, but my feedback focuses on improving error handling within the new logic. Specifically, I suggest logging a warning instead of silently ignoring exceptions during the decoding process to improve debuggability.

gemini-code-assist · 2026-03-18T04:19:42Z

vllm/entrypoints/openai/engine/serving.py

+            except Exception:
+                pass


While catching a broad Exception is acceptable here to prevent crashing the logging logic, silently passing with pass can hide underlying issues with the tokenizer or the decoding process. It would be more informative to log a warning when an exception occurs. This will help in debugging cases where the prompt text is unexpectedly None in the logs.

except Exception as e: logger.warning( "Failed to decode token IDs for request %s to log prompt text. " "Error: %s", request_id, e)

DarkLight1337 · 2026-03-18T04:30:04Z

cc @qandrew @bbrowning for Harmony encoding

mergify · 2026-03-18T04:32:29Z

Hi @karanb192, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

qandrew

Hi thanks for the fix!

I think the issue here should be addressed more upstream, as I had noticed something similar a while ago for debugging in responsesAPI in addition to chatCompletions.

https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L651

would you mind changing those lines to this and see if it works?

if prompt_text := prompt.get("prompt"):
    inputs["prompt"] = prompt_text
elif (tokenizer := self.get_tokenizer()) is not None:
    inputs["prompt"] = tokenizer.decode(prompt_token_ids)
if cache_salt := prompt.get("cache_salt"):
    inputs["cache_salt"] = cache_salt

DarkLight1337 · 2026-03-18T05:20:41Z

Actually, that code is on purpose, since users might pass token inputs directly and we want to avoid unnecessary detokenization. You can set TokenizeParams.needs_detokenization flag to force text inputs to appear even in that case.

@qandrew

Move the prompt-text decode fix upstream into BaseRenderer._process_tokens, as suggested by @qandrew in PR review. When models like gpt-oss-20b or Mistral tokenizer models render chat prompts directly to token IDs, the engine prompt dict only contains prompt_token_ids without a text prompt field. This caused "prompt: None" in RequestLogger debug output and left the prompt text unavailable for any downstream consumer. Fix by decoding prompt_token_ids back to text in _process_tokens when the prompt text is not already present and a tokenizer is available. This ensures the prompt text is populated in engine inputs for all consumers (logging, responses API, etc.), not just the debug logger. When skip_tokenizer_init=True, self.tokenizer is None, so the decode is safely skipped. Fixes vllm-project#37253 Signed-off-by: karanb192 <karan@example.com>

qandrew · 2026-03-18T13:04:36Z

Actually, that code is on purpose, since users might pass token inputs directly and we want to avoid unnecessary detokenization. You can set TokenizeParams.needs_detokenization flag to force text inputs to appear even in that case.

ah i see as it was introduced here #32863. then maybe we don't need code changes and just need to set the TokenizeParams accordingly?

DarkLight1337 · 2026-03-18T13:10:36Z

the prompt is rendered directly to token IDs (via Harmony encoding)

It would be even better to keep the text in this code so we don't need to detokenize

bbrowning · 2026-03-18T15:09:21Z

the prompt is rendered directly to token IDs (via Harmony encoding)

It would be even better to keep the text in this code so we don't need to detokenize

Just a note that the user's input text and the text we get from decoding the token ids will be very different. The former is just whatever the user passed as content of messages, while the latter will be the actual textual representation of the Harmony formatted tokens that get passed to the model. The latter is useful for debugging, but unnecessary and potentially performance-impacting if we do that unconditionally for every inference request as it requires an otherwise unnecessary tokenizer.decode call.

qandrew · 2026-03-18T18:29:33Z

i played around a bit with a debugger looking at the code, with Qwen3 when we hit preprocess_chat via chatCompletions, on this line prompt already includes the debug string that we'd want to later show https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L497

{'prompt': '<|im_start|>user\nHello.<|im_end|>\n<|im_start|>assistant\n'}

so for this case we should just be able to extract that out and show the debug log to the user directly.

@karanb192 maybe GPT-OSS doesn't go from chatCompletion conversations -> prompt str -> token Ids (like in qwen3), but chatCompletion conversations -> token ids directly? 🤷

mergify bot added frontend bug Something isn't working labels Mar 18, 2026

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

qandrew suggested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Decode prompt text from token IDs upstream in renderer#37380

[Bugfix] Decode prompt text from token IDs upstream in renderer#37380
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/request-log-prompt-none

karanb192 commented Mar 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

DarkLight1337 commented Mar 18, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

qandrew left a comment

Uh oh!

DarkLight1337 commented Mar 18, 2026 •

edited

Loading

Uh oh!

qandrew commented Mar 18, 2026

Uh oh!

DarkLight1337 commented Mar 18, 2026 •

edited

Loading

Uh oh!

bbrowning commented Mar 18, 2026

Uh oh!

qandrew commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

karanb192 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qandrew commented Mar 18, 2026

Uh oh!

DarkLight1337 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbrowning commented Mar 18, 2026

Uh oh!

qandrew commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karanb192 commented Mar 18, 2026 •

edited

Loading

DarkLight1337 commented Mar 18, 2026 •

edited

Loading

DarkLight1337 commented Mar 18, 2026 •

edited

Loading

DarkLight1337 commented Mar 18, 2026 •

edited

Loading