[Fix] pass chat_template_kwargs to get_system_message in gpt-oss#30873
[Fix] pass chat_template_kwargs to get_system_message in gpt-oss#30873seunggil1 wants to merge 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: seunggil1 <ksgg1navercom@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request aims to pass chat_template_kwargs to get_system_message for gpt-oss models. However, the current implementation introduces a critical bug that will cause a TypeError at runtime. The get_system_message function does not accept arbitrary keyword arguments, and the proposed change can lead to passing unexpected or duplicate arguments. I've provided a comment with a suggested fix to make this change safe.
|
Seems like a duplicate of #30247 |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: seung <38664481+seunggil1@users.noreply.github.com>
|
Hi @seunggil1, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: seunggil1 <ksgg1navercom@gmail.com>
I missed that PR—it wasn't there when I first checked, so it must be recent. Thanks for pointing it out! Since the objective is the same, I'll take a quick look at the implementation differences. If I find that the approaches are identical or if this PR is redundant, I'll go ahead and close it. Thanks again! |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
This PR addresses the issue where
chat_template_kwargs(specificallymodel_identity) is ignored when usinggpt-ossmodels.Currently,
gpt-ossmodels use a custom request handling method_make_request_with_harmonyinstead of the standardapply_hf_chat_template. Inside this method, theget_system_messagefunction is called without passingchat_template_kwargs. As a result, themodel_identityparameter defaults to a hardcoded value ("You are ChatGPT..."), preventing users from dynamically customizing the system prompt per request.This fix passes
chat_template_kwargstoget_system_messageinvllm/entrypoints/openai/serving_chat.py, allowing users to override the default model identity and other parameters supported by the harmony library.This fix implements the solution discussed in #23975 (comment)
Partially Fixes #23015
Fixes #23975
Test Plan
Environment:
vllm/vllm-openai:v0.12.0)Start the vLLM server with a
gpt-ossmodel (e.g.,gpt-oss-20b):Send a chat completion request with a custom
model_identityinchat_template_kwargs:Test Result
Before Fix:
The model ignores the
model_identityand responds with the default identity.{ "object": "chat.completion", "created": 1765981514, "model": "vllm/gpt-oss-20b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I’m ChatGPT, a large language model created by OpenAI. I’m designed to understand and generate natural language, so I can help answer questions, brainstorm ideas, explain concepts, draft text, and more. I don’t have personal experiences or feelings—just a lot of patterns learned from text. How can I assist you today?", "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning": "The user asks \"Who are you?\" We need to respond as ChatGPT. The user might want a brief introduction. We should mention that we are ChatGPT, a large language model trained by OpenAI, etc. Also mention that we can help with many tasks. We should keep it concise.", "reasoning_content": "The user asks \"Who are you?\" We need to respond as ChatGPT. The user might want a brief introduction. We should mention that we are ChatGPT, a large language model trained by OpenAI, etc. Also mention that we can help with many tasks. We should keep it concise." }, "logprobs": null, "finish_reason": "stop", "stop_reason": null, "token_ids": null } ], "usage": { "prompt_tokens": 73, "total_tokens": 211, "completion_tokens": 138, "prompt_tokens_details": null }, }After Fix:
The model correctly adopts the
model_identityprovided inchat_template_kwargs.{ "id": "chatcmpl-9fb8cfe7a67f083a", "object": "chat.completion", "created": 1765981838, "model": "vllm/gpt-oss-20b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I’m your custom AI assistant—an advanced language model built on OpenAI’s GPT‑4 architecture, fine‑tuned to help you with information, ideas, and tasks. I’m here to answer questions, offer suggestions, and chat about almost anything you’re curious about. Just let me know what you need!", "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning": "We need to respond as the custom AI assistant. The user asks \"Who are you?\" We should answer in a friendly manner, describing ourselves as a custom AI assistant. Probably mention that we are a language model trained by OpenAI, but customized. We should keep it concise.", "reasoning_content": "We need to respond as the custom AI assistant. The user asks \"Who are you?\" We should answer in a friendly manner, describing ourselves as a custom AI assistant. Probably mention that we are a language model trained by OpenAI, but customized. We should keep it concise." }, "logprobs": null, "finish_reason": "stop", "stop_reason": null, "token_ids": null } ], "usage": { "prompt_tokens": 66, "total_tokens": 196, "completion_tokens": 130, "prompt_tokens_details": null }, }Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.