Skip to content

[Bugfix] Decode prompt text from token IDs upstream in renderer#37380

Open
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/request-log-prompt-none
Open

[Bugfix] Decode prompt text from token IDs upstream in renderer#37380
karanb192 wants to merge 1 commit intovllm-project:mainfrom
karanb192:fix/request-log-prompt-none

Conversation

@karanb192
Copy link
Copy Markdown

@karanb192 karanb192 commented Mar 18, 2026

Summary

  • When using models like gpt-oss-20b via the Chat Completions API, the prompt is rendered directly to token IDs (via Harmony encoding), so the engine prompt only contains prompt_token_ids without a text prompt field. This caused prompt: None in debug logs and left the prompt text unavailable for any downstream consumer (logging, responses API, etc.).
  • Fix (per @qandrew's review suggestion): move the decode upstream into BaseRenderer._process_tokens so that inputs["prompt"] is always populated when token IDs are available. When the tokenizer is not initialized (skip_tokenizer_init=True), self.tokenizer is None and the decode is safely skipped.
  • This replaces the previous approach of decoding in OpenAIServing._log_inputs, which only fixed the logging symptom. The upstream fix ensures prompt text is available for all consumers.
  • Also fixes the same issue for Mistral tokenizer models whose chat templates return token IDs directly.

Fixes #37253

Test plan

  • Verify with a model that uses Harmony rendering (e.g., gpt-oss-20b) that the debug log now shows the decoded prompt text instead of None
  • Verify with a standard HF model (e.g., Gemma-3) that behavior is unchanged (prompt text was already available, so the elif is not reached)
  • Verify with Mistral tokenizer models that the decoded prompt now appears in logs
  • Confirm no performance regression — tokenizer.decode() is only called when prompt text is not already present in the TokensPrompt
  • Verify that when skip_tokenizer_init=True, the decode is safely skipped (self.tokenizer is None)

@mergify mergify bot added frontend bug Something isn't working labels Mar 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue where the prompt was logged as None for token-only prompts by decoding the token IDs back to text. The implementation is correct, but my feedback focuses on improving error handling within the new logic. Specifically, I suggest logging a warning instead of silently ignoring exceptions during the decoding process to improve debuggability.

Comment on lines +835 to +836
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While catching a broad Exception is acceptable here to prevent crashing the logging logic, silently passing with pass can hide underlying issues with the tokenizer or the decoding process. It would be more informative to log a warning when an exception occurs. This will help in debugging cases where the prompt text is unexpectedly None in the logs.

            except Exception as e:
                logger.warning(
                    "Failed to decode token IDs for request %s to log prompt text. "
                    "Error: %s", request_id, e)

@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Mar 18, 2026

cc @qandrew @bbrowning for Harmony encoding

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 18, 2026

Hi @karanb192, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Copy link
Copy Markdown
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for the fix!

I think the issue here should be addressed more upstream, as I had noticed something similar a while ago for debugging in responsesAPI in addition to chatCompletions.

https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L651

would you mind changing those lines to this and see if it works?

if prompt_text := prompt.get("prompt"):
    inputs["prompt"] = prompt_text
elif (tokenizer := self.get_tokenizer()) is not None:
    inputs["prompt"] = tokenizer.decode(prompt_token_ids)
if cache_salt := prompt.get("cache_salt"):
    inputs["cache_salt"] = cache_salt

@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Mar 18, 2026

Actually, that code is on purpose, since users might pass token inputs directly and we want to avoid unnecessary detokenization. You can set TokenizeParams.needs_detokenization flag to force text inputs to appear even in that case.

Move the prompt-text decode fix upstream into
BaseRenderer._process_tokens, as suggested by @qandrew in PR review.

When models like gpt-oss-20b or Mistral tokenizer models render chat
prompts directly to token IDs, the engine prompt dict only contains
prompt_token_ids without a text prompt field. This caused
"prompt: None" in RequestLogger debug output and left the prompt text
unavailable for any downstream consumer.

Fix by decoding prompt_token_ids back to text in _process_tokens when
the prompt text is not already present and a tokenizer is available.
This ensures the prompt text is populated in engine inputs for all
consumers (logging, responses API, etc.), not just the debug logger.

When skip_tokenizer_init=True, self.tokenizer is None, so the decode
is safely skipped.

Fixes vllm-project#37253

Signed-off-by: karanb192 <karan@example.com>
@qandrew
Copy link
Copy Markdown
Contributor

qandrew commented Mar 18, 2026

Actually, that code is on purpose, since users might pass token inputs directly and we want to avoid unnecessary detokenization. You can set TokenizeParams.needs_detokenization flag to force text inputs to appear even in that case.

ah i see as it was introduced here #32863. then maybe we don't need code changes and just need to set the TokenizeParams accordingly?

@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Mar 18, 2026

the prompt is rendered directly to token IDs (via Harmony encoding)

It would be even better to keep the text in this code so we don't need to detokenize

@bbrowning
Copy link
Copy Markdown
Contributor

the prompt is rendered directly to token IDs (via Harmony encoding)

It would be even better to keep the text in this code so we don't need to detokenize

Just a note that the user's input text and the text we get from decoding the token ids will be very different. The former is just whatever the user passed as content of messages, while the latter will be the actual textual representation of the Harmony formatted tokens that get passed to the model. The latter is useful for debugging, but unnecessary and potentially performance-impacting if we do that unconditionally for every inference request as it requires an otherwise unnecessary tokenizer.decode call.

@qandrew
Copy link
Copy Markdown
Contributor

qandrew commented Mar 18, 2026

i played around a bit with a debugger looking at the code, with Qwen3 when we hit preprocess_chat via chatCompletions, on this line prompt already includes the debug string that we'd want to later show https://github.com/vllm-project/vllm/blob/main/vllm/renderers/base.py#L497

{'prompt': '<|im_start|>user\nHello.<|im_end|>\n<|im_start|>assistant\n'}

so for this case we should just be able to extract that out and show the debug log to the user directly.

@karanb192 maybe GPT-OSS doesn't go from chatCompletion conversations -> prompt str -> token Ids (like in qwen3), but chatCompletion conversations -> token ids directly? 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: prompt is logged as None in RequestLogItem for gpt-oss-20b (Chat Completion API)

4 participants