Skip to content

fix: normalize messages before chat template application#240

Merged
Thump604 merged 3 commits intowaybarrios:mainfrom
Thump604:fix/normalize-messages
Apr 11, 2026
Merged

fix: normalize messages before chat template application#240
Thump604 merged 3 commits intowaybarrios:mainfrom
Thump604:fix/normalize-messages

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

  • Add _normalize_messages() in server.py: maps non-standard roles (developer -> system, per OpenAI Responses API) and merges consecutive same-role messages
  • Apply it in all four request paths before apply_chat_template: create_chat_completion MLLM path, create_chat_completion LLM path, create_anthropic_message, _stream_anthropic_messages
  • Fixes crashes from developer role (Qwen3.5 template rejects unknown roles) and consecutive same-role messages (e.g. OpenCode sends [system, system, user, user])

Split from #224 for easier review. The other parts of #224 are in #NEW-hybrid-batching and #NEW-scheduler.

Behavior

Only merges when both adjacent messages have string content. Messages with list content (multimodal image/video payloads) are left as-is to preserve attachments.

Test plan

  • Multi-turn conversation with role: "developer" messages does not crash
  • OpenCode [system, system, user, user] format normalizes to [system, user]
  • Well-formed alternating messages pass through unchanged
  • Multimodal messages with list content are not mangled

Add _normalize_messages() to server.py and call it in all request paths
before apply_chat_template. Maps non-standard roles (developer -> system,
per OpenAI Responses API) and merges consecutive same-role messages.

Fixes agent crashes from:
- OpenAI Responses API sending role="developer" (unrecognized by Qwen3.5 template)
- OpenCode sending [system, system, user, user] (rejected by alternating-role templates)

Applied in create_chat_completion (both MLLM and LLM paths),
create_anthropic_message, and _stream_anthropic_messages.
Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff. Clean implementation — _normalize_messages() correctly maps developer -> system and merges consecutive same-role messages with \n\n separator. Good guard on list content (multimodal payloads preserved). All 4 request paths covered (MLLM, LLM, Anthropic, Anthropic streaming). Tests are thorough — edge cases for None content, multimodal, and 3+ consecutive messages.

This fixes real crashes we see with OpenCode and agent frameworks that send [system, system, user, user] or developer role messages.

@Thump604
Copy link
Copy Markdown
Collaborator Author

Thump604 commented Apr 9, 2026

Status ping — this PR has been open 7 days with no review activity. I kept the scope intentionally narrow: normalize messages before apply_chat_template so out-of-order or consecutive same-role messages do not break Qwen 3.5's chat template. This is the same message-normalization issue I hit in production earlier. The branch is mergeable on current main (d19a8d3d) and CI is green. Flagging for visibility.

@janhilgard
Copy link
Copy Markdown
Collaborator

Already approved on my side. CI green, branch mergeable, scope is minimal and well-contained. @waybarrios — this one is ready to go whenever you get a chance.

@Thump604 Thump604 merged commit 9fe4d3f into waybarrios:main Apr 11, 2026
7 checks passed
janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Apr 11, 2026
Incorporates 53 upstream commits including:
- O(1) state-machine reasoning parser (PR waybarrios#234)
- Resumable model download (PR waybarrios#77)
- Block-aware prefix cache (PR waybarrios#217)
- Message normalization (PR waybarrios#240)
- Full sampling params (PR waybarrios#258)
- ThinkRouter for Anthropic streaming
- 22 new test files
- License file, docs updates

Conflict resolution: preserved production features
(frequency_penalty conversion, tool markup safety nets,
openai_to_anthropic import) while adopting upstream
improvements (Gemma4 parser rewrite, cleaner logging,
_model_name in streaming chunks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants