[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482
[Frontend] Honor chat template for gpt-oss harmony (#23015)#30482ajayanto wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| conversation: list[ConversationMessage] = [] | ||
| for msg in request.messages: | ||
| model_dump_fn = getattr(msg, "model_dump", None) | ||
| if callable(model_dump_fn): | ||
| conversation.append(model_dump_fn(exclude_none=True)) |
There was a problem hiding this comment.
Preserve Harmony system/dev prompts when templating
When a chat_template is configured in the Harmony chat-completions path, the template rendering rebuilds conversation solely from request.messages and then replaces prompt_token_ids with the templated output. That block omits the Harmony-specific system and developer messages assembled just above (tool guidance, reasoning-effort instructions, etc.), so successful template rendering sends the model a prompt without those guardrails or tool metadata whenever a server chat template is enabled.
Useful? React with 👍 / 👎.
| if isinstance(request.input, str): | ||
| conversation.append({"role": "user", "content": request.input}) | ||
| elif isinstance(request.input, list): | ||
| for item in request.input: | ||
| if hasattr(item, "model_dump"): |
There was a problem hiding this comment.
Keep Harmony conversation history in responses templating
In the Harmony Responses path, the chat-template logic constructs conversation only from the current instructions and input before re-rendering, and that templated output overwrites the earlier prompt_token_ids. Because no prior turns or Harmony system/developer prompts from _construct_input_messages_with_harmony are included, any multi-turn Harmony conversation loses all previous context as soon as a server chat template is configured and the template renders successfully.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request aims to honor server-level chat templates for gpt-oss harmony models. The implementation correctly adds a fallback mechanism to the default Harmony rendering path if applying the custom chat template fails. However, I've identified a critical issue in both modified files where a wrapped tokenizer object is passed to a function expecting an unwrapped transformers tokenizer. This would lead to a runtime AttributeError and prevent the feature from working correctly for common tokenizer types. I have provided comments and suggestions to address this bug.
| tools = [tool.model_dump() for tool in request.tools] | ||
|
|
||
| prompt_text = apply_hf_chat_template( | ||
| tokenizer=tokenizer, |
There was a problem hiding this comment.
The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.
This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.
| tokenizer=tokenizer, | |
| tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer), |
| tools = [tool.model_dump() for tool in request.tools] | ||
|
|
||
| prompt_text = apply_hf_chat_template( | ||
| tokenizer=tokenizer, |
There was a problem hiding this comment.
The tokenizer object is a TokenizerLike wrapper, but apply_hf_chat_template expects an unwrapped transformers tokenizer. Passing the wrapper directly will cause an AttributeError at runtime for HfTokenizer instances.
This change attempts to unwrap it by accessing the hf_tokenizer attribute. While this fixes the issue for HfTokenizer, please be aware that this may still not work for other tokenizer types like MistralTokenizer, which are not directly compatible with apply_hf_chat_template. The broad except Exception block will catch this failure, but it means the feature won't work for those tokenizers, and a warning will be logged.
| tokenizer=tokenizer, | |
| tokenizer=getattr(tokenizer, "hf_tokenizer", tokenizer), |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
2cba6a0 to
cdb8cce
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ajayanto <ajay.anto@gmail.com>
c26e024 to
3785730
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
Issue
Reason
Summary
Testing