Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions vllm/entrypoints/openai/chat_completion/serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -1960,15 +1960,24 @@ def _make_request_with_harmony(
)
messages.append(sys_msg)

chat_messages = request.messages
merged_instructions: str | None = None
if chat_messages and chat_messages[0]["role"] in ("system", "developer"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing ["role"] directly is unsafe here. ChatCompletionMessageParam is a union that includes OpenAIHarmonyMessage (a class), and even for dict-based messages, it is safer to use .get("role") to avoid potential KeyError or TypeError. Given that harmony_utils.py explicitly handles non-dict inputs, this part should be equally robust to prevent crashes if objects are passed programmatically.

content = chat_messages[0].get("content")
if isinstance(content, str):
merged_instructions = content
chat_messages = chat_messages[1:]

# Add developer message.
if request.tools:
if request.tools or merged_instructions:
dev_msg = get_developer_message(
tools=request.tools if should_include_tools else None # type: ignore[arg-type]
instructions=merged_instructions,
tools=request.tools if should_include_tools else None, # type: ignore[arg-type]
)
messages.append(dev_msg)
Comment on lines +1972 to 1977
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation causes data loss when VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS is enabled. The get_developer_message function in harmony_utils.py ignores the instructions argument if that environment variable is set. Since get_system_message was already called (at line 1946) without these instructions, the user-provided system/developer message content will be completely discarded from the final prompt. To fix this, the extraction logic should be moved before the get_system_message call so instructions can be passed to the appropriate block based on the environment configuration.


# Add user message.
messages.extend(parse_chat_inputs_to_harmony_messages(request.messages))
messages.extend(parse_chat_inputs_to_harmony_messages(chat_messages))

# Render prompt token ids.
prompt_token_ids = render_for_completion(messages)
Expand Down