Skip to content

[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482

Open
ajayanto wants to merge 1 commit intovllm-project:mainfrom
ajayanto:harmony-chat-template
Open

[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482
ajayanto wants to merge 1 commit intovllm-project:mainfrom
ajayanto:harmony-chat-template

Conversation

@ajayanto
Copy link
Copy Markdown

@ajayanto ajayanto commented Dec 11, 2025

Issue

  • For gpt-oss models, --chat-template (and tokenizer/chat_template_kwargs) are ignored; prompts are rendered via Harmony instead of the normal chat-template pipeline.

Reason

  • When hf_config.model_type == "gpt_oss", vLLM sets use_harmony and routes requests through _make_request_with_harmony(...), which directly constructs Harmony system/developer/user messages and calls render_for_completion. That path never calls _preprocess_chat / apply_hf_chat_template, so any chat template settings are bypassed.

Summary

  • allow server-level chat_template to be applied even in Harmony (gpt-oss) paths for chat and responses
  • pass tokenizer into Harmony preprocessing and render via apply_hf_chat_template when provided, with safe fallback to Harmony default on errors
  • tighten typing/formatting to satisfy linters and mypy; keep existing Harmony stop tokens and tool handling unchanged

Testing

  • pre-commit run -a
  • manual: gpt-oss-120b chat completion with custom --chat-template (no errors, template applied)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1803 to +1807
conversation: list[ConversationMessage] = []
for msg in request.messages:
model_dump_fn = getattr(msg, "model_dump", None)
if callable(model_dump_fn):
conversation.append(model_dump_fn(exclude_none=True))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Harmony system/dev prompts when templating

When a chat_template is configured in the Harmony chat-completions path, the template rendering rebuilds conversation solely from request.messages and then replaces prompt_token_ids with the templated output. That block omits the Harmony-specific system and developer messages assembled just above (tool guidance, reasoning-effort instructions, etc.), so successful template rendering sends the model a prompt without those guardrails or tool metadata whenever a server chat template is enabled.

Useful? React with 👍 / 👎.

Comment on lines +603 to +607
if isinstance(request.input, str):
conversation.append({"role": "user", "content": request.input})
elif isinstance(request.input, list):
for item in request.input:
if hasattr(item, "model_dump"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep Harmony conversation history in responses templating

In the Harmony Responses path, the chat-template logic constructs conversation only from the current instructions and input before re-rendering, and that templated output overwrites the earlier prompt_token_ids. Because no prior turns or Harmony system/developer prompts from _construct_input_messages_with_harmony are included, any multi-turn Harmony conversation loses all previous context as soon as a server chat template is configured and the template renders successfully.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to honor server-level chat templates for gpt-oss harmony models. The implementation correctly adds a fallback mechanism to the default Harmony rendering path if applying the custom chat template fails. However, I've identified a critical issue in both modified files where a wrapped tokenizer object is passed to a function expecting an unwrapped transformers tokenizer. This would lead to a runtime AttributeError and prevent the feature from working correctly for common tokenizer types. I have provided comments and suggestions to address this bug.

tools = [tool.model_dump() for tool in request.tools]

prompt_text = apply_hf_chat_template(
tokenizer=tokenizer,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.

This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.

Suggested change
tokenizer=tokenizer,
tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer),

tools = [tool.model_dump() for tool in request.tools]

prompt_text = apply_hf_chat_template(
tokenizer=tokenizer,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.

This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.

Suggested change
tokenizer=tokenizer,
tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer),

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@ajayanto ajayanto force-pushed the harmony-chat-template branch 3 times, most recently from 2cba6a0 to cdb8cce Compare December 11, 2025 11:31
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ajayanto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 15, 2025
@ajayanto ajayanto force-pushed the harmony-chat-template branch from c26e024 to 3785730 Compare December 16, 2025 17:04
@mergify mergify bot removed the needs-rebase label Dec 16, 2025
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 19, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ajayanto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models needs-rebase stale Over 90 days of inactivity

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants