Skip to content

[Frontend] Merge developer instructions into tool block for Harmony prompt rendering#34951

Open
beom115 wants to merge 4 commits intovllm-project:mainfrom
beom115:feature/harmony_chat_developer_msg
Open

[Frontend] Merge developer instructions into tool block for Harmony prompt rendering#34951
beom115 wants to merge 4 commits intovllm-project:mainfrom
beom115:feature/harmony_chat_developer_msg

Conversation

@beom115
Copy link
Copy Markdown

@beom115 beom115 commented Feb 20, 2026

Purpose

Current Harmony chat implementation renders the system/developer instructions and tool definitions in two separate <|start|>developer blocks. However, models that follow the OpenAI gpt-oss-20b/120b chat template (based on the chat_template.jinja) expect a single developer block containing both # Instructions and # Tools sections.

This PR modifies _make_request_with_harmony to:

Extract the initial system or developer message.

Merge it into the same developer block as the tool definitions under the # Instructions header.

This ensures compatibility with models that require a unified developer message block.

Test Plan

import json
from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_current_weather",
        "description": "Get weather information for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "Name city and state"},
            },
            "required": ["location"],
        },
    }
}]

messages = [
    {"role": "developer", "content": "Please Call Tool."},
    {"role": "user", "content": "What's the weather like in New York?"}
]

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=messages,
    tools=tools,
)

Test Result

Before (Redundant blocks):

<|start|>system<|message|>...<|end|>
<|start|>developer<|message|># Tools\n\n...<|end|>
<|start|>developer<|message|>Please Call Tool.<|end|>
<|start|>user<|message|>...<|end|>

After (Merged block - Correct for gpt-oss):

<|start|>system<|message|>...<|end|>
<|start|>developer<|message|># Instructions\n\nPlease Call Tool.\n\n# Tools\n\n...<|end|>
<|start|>user<|message|>...<|end|>

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…Harmony

Signed-off-by: 전상범 <beom115@naver.com>
…r Harmony

Signed-off-by: 전상범 <beom115@naver.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly identifies the need to merge system/developer instructions into the tool block for Harmony-based models to ensure compatibility with specific chat templates. However, the current implementation has a critical logic flaw that leads to the loss of user-provided instructions when the VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS environment variable is enabled. Additionally, the code uses unsafe dictionary access on message objects which could lead to runtime errors if non-dictionary types are encountered.


chat_messages = request.messages
merged_instructions: str | None = None
if chat_messages and chat_messages[0]["role"] in ("system", "developer"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing ["role"] directly is unsafe here. ChatCompletionMessageParam is a union that includes OpenAIHarmonyMessage (a class), and even for dict-based messages, it is safer to use .get("role") to avoid potential KeyError or TypeError. Given that harmony_utils.py explicitly handles non-dict inputs, this part should be equally robust to prevent crashes if objects are passed programmatically.

Comment on lines +1963 to 1968
if request.tools or merged_instructions:
dev_msg = get_developer_message(
tools=request.tools if should_include_tools else None # type: ignore[arg-type]
instructions=merged_instructions,
tools=request.tools if should_include_tools else None, # type: ignore[arg-type]
)
messages.append(dev_msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation causes data loss when VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is enabled. The get_developer_message function in harmony_utils.py ignores the instructions argument if that environment variable is set. Since get_system_message was already called (at line 1946) without these instructions, the user-provided system/developer message content will be completely discarded from the final prompt. To fix this, the extraction logic should be moved before the get_system_message call so instructions can be passed to the appropriate block based on the environment configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant