[Frontend] Honor chat template for gpt-oss harmony (#23015) by ajayanto · Pull Request #30482 · vllm-project/vllm

ajayanto · 2025-12-11T10:23:37Z

Issue

For gpt-oss models, --chat-template (and tokenizer/chat_template_kwargs) are ignored; prompts are rendered via Harmony instead of the normal chat-template pipeline.

Reason

When hf_config.model_type == "gpt_oss", vLLM sets use_harmony and routes requests through _make_request_with_harmony(...), which directly constructs Harmony system/developer/user messages and calls render_for_completion. That path never calls _preprocess_chat / apply_hf_chat_template, so any chat template settings are bypassed.

Summary

allow server-level chat_template to be applied even in Harmony (gpt-oss) paths for chat and responses
pass tokenizer into Harmony preprocessing and render via apply_hf_chat_template when provided, with safe fallback to Harmony default on errors
tighten typing/formatting to satisfy linters and mypy; keep existing Harmony stop tokens and tool handling unchanged

Testing

pre-commit run -a
manual: gpt-oss-120b chat completion with custom --chat-template (no errors, template applied)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-11T10:27:17Z

vllm/entrypoints/openai/serving_chat.py

+                conversation: list[ConversationMessage] = []
+                for msg in request.messages:
+                    model_dump_fn = getattr(msg, "model_dump", None)
+                    if callable(model_dump_fn):
+                        conversation.append(model_dump_fn(exclude_none=True))


Preserve Harmony system/dev prompts when templating

When a chat_template is configured in the Harmony chat-completions path, the template rendering rebuilds conversation solely from request.messages and then replaces prompt_token_ids with the templated output. That block omits the Harmony-specific system and developer messages assembled just above (tool guidance, reasoning-effort instructions, etc.), so successful template rendering sends the model a prompt without those guardrails or tool metadata whenever a server chat template is enabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-11T10:27:17Z

vllm/entrypoints/openai/serving_responses.py

+                if isinstance(request.input, str):
+                    conversation.append({"role": "user", "content": request.input})
+                elif isinstance(request.input, list):
+                    for item in request.input:
+                        if hasattr(item, "model_dump"):


Keep Harmony conversation history in responses templating

In the Harmony Responses path, the chat-template logic constructs conversation only from the current instructions and input before re-rendering, and that templated output overwrites the earlier prompt_token_ids. Because no prior turns or Harmony system/developer prompts from _construct_input_messages_with_harmony are included, any multi-turn Harmony conversation loses all previous context as soon as a server chat template is configured and the template renders successfully.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request aims to honor server-level chat templates for gpt-oss harmony models. The implementation correctly adds a fallback mechanism to the default Harmony rendering path if applying the custom chat template fails. However, I've identified a critical issue in both modified files where a wrapped tokenizer object is passed to a function expecting an unwrapped transformers tokenizer. This would lead to a runtime AttributeError and prevent the feature from working correctly for common tokenizer types. I have provided comments and suggestions to address this bug.

gemini-code-assist · 2025-12-11T10:27:29Z

vllm/entrypoints/openai/serving_chat.py

+                    tools = [tool.model_dump() for tool in request.tools]
+
+                prompt_text = apply_hf_chat_template(
+                    tokenizer=tokenizer,


The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.

This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.

Suggested change

tokenizer=tokenizer,

tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer),

gemini-code-assist · 2025-12-11T10:27:29Z

vllm/entrypoints/openai/serving_responses.py

+                    tools = [tool.model_dump() for tool in request.tools]
+
+                prompt_text = apply_hf_chat_template(
+                    tokenizer=tokenizer,


The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.

This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.

Suggested change

tokenizer=tokenizer,

tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer),

github-actions · 2025-12-11T10:37:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-12-15T04:25:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ajayanto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ajayanto <ajay.anto@gmail.com>

mergify · 2025-12-19T09:02:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ajayanto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2026-03-21T02:26:28Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

ajayanto requested review from aarnphm and chaunceyjiang as code owners December 11, 2025 10:23

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Dec 11, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 11, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 11, 2025

chatgpt-codex-connector bot reviewed Dec 11, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

ajayanto force-pushed the harmony-chat-template branch 3 times, most recently from 2cba6a0 to cdb8cce Compare December 11, 2025 11:31

LucasWilkinson assigned aarnphm Dec 12, 2025

mergify bot added the needs-rebase label Dec 15, 2025

[Frontend] Honor chat template for gpt-oss harmony (vllm-project#23015)

3785730

Signed-off-by: ajayanto <ajay.anto@gmail.com>

ajayanto force-pushed the harmony-chat-template branch from c26e024 to 3785730 Compare December 16, 2025 17:04

mergify bot removed the needs-rebase label Dec 16, 2025

mergify bot added the needs-rebase label Dec 19, 2025

ohsono mentioned this pull request Mar 14, 2026

[Bugfix] Add SM 12.1 support + Fix GPT-OSS Harmony garbled reasoning and HarmonyError crashes #31607

Open

4 tasks

github-actions bot added the stale Over 90 days of inactivity label Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482

[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482
ajayanto wants to merge 1 commit intovllm-project:mainfrom
ajayanto:harmony-chat-template

ajayanto commented Dec 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 11, 2025

Uh oh!

chatgpt-codex-connector bot Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

mergify bot commented Dec 15, 2025

Uh oh!

mergify bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	tokenizer=tokenizer,
	tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer),

Uh oh!

Conversation

ajayanto commented Dec 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Reason

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

mergify bot commented Dec 15, 2025

Uh oh!

mergify bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajayanto commented Dec 11, 2025 •

edited by github-actions bot

Loading