fix: normalize messages before chat template application by Thump604 · Pull Request #240 · waybarrios/vllm-mlx

Thump604 · 2026-03-31T23:49:10Z

Summary

Add _normalize_messages() in server.py: maps non-standard roles (developer -> system, per OpenAI Responses API) and merges consecutive same-role messages
Apply it in all four request paths before apply_chat_template: create_chat_completion MLLM path, create_chat_completion LLM path, create_anthropic_message, _stream_anthropic_messages
Fixes crashes from developer role (Qwen3.5 template rejects unknown roles) and consecutive same-role messages (e.g. OpenCode sends [system, system, user, user])

Split from #224 for easier review. The other parts of #224 are in #NEW-hybrid-batching and #NEW-scheduler.

Behavior

Only merges when both adjacent messages have string content. Messages with list content (multimodal image/video payloads) are left as-is to preserve attachments.

Test plan

Multi-turn conversation with role: "developer" messages does not crash
OpenCode [system, system, user, user] format normalizes to [system, user]
Well-formed alternating messages pass through unchanged
Multimodal messages with list content are not mangled

Add _normalize_messages() to server.py and call it in all request paths before apply_chat_template. Maps non-standard roles (developer -> system, per OpenAI Responses API) and merges consecutive same-role messages. Fixes agent crashes from: - OpenAI Responses API sending role="developer" (unrecognized by Qwen3.5 template) - OpenCode sending [system, system, user, user] (rejected by alternating-role templates) Applied in create_chat_completion (both MLLM and LLM paths), create_anthropic_message, and _stream_anthropic_messages.

janhilgard

Reviewed the diff. Clean implementation — _normalize_messages() correctly maps developer -> system and merges consecutive same-role messages with \n\n separator. Good guard on list content (multimodal payloads preserved). All 4 request paths covered (MLLM, LLM, Anthropic, Anthropic streaming). Tests are thorough — edge cases for None content, multimodal, and 3+ consecutive messages.

This fixes real crashes we see with OpenCode and agent frameworks that send [system, system, user, user] or developer role messages.

Thump604 · 2026-04-09T17:26:58Z

Status ping — this PR has been open 7 days with no review activity. I kept the scope intentionally narrow: normalize messages before apply_chat_template so out-of-order or consecutive same-role messages do not break Qwen 3.5's chat template. This is the same message-normalization issue I hit in production earlier. The branch is mergeable on current main (d19a8d3d) and CI is green. Flagging for visibility.

janhilgard · 2026-04-09T17:39:08Z

Already approved on my side. CI green, branch mergeable, scope is minimal and well-contained. @waybarrios — this one is ready to go whenever you get a chance.

Incorporates 53 upstream commits including: - O(1) state-machine reasoning parser (PR waybarrios#234) - Resumable model download (PR waybarrios#77) - Block-aware prefix cache (PR waybarrios#217) - Message normalization (PR waybarrios#240) - Full sampling params (PR waybarrios#258) - ThinkRouter for Anthropic streaming - 22 new test files - License file, docs updates Conflict resolution: preserved production features (frequency_penalty conversion, tool markup safety nets, openai_to_anthropic import) while adopting upstream improvements (Gemma4 parser rewrite, cleaner logging, _model_name in streaming chunks). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Mar 31, 2026

fix: MLLM hybrid batching + message normalization #224

Closed

Looking for collaborators #238

Closed

Thump604 added 2 commits March 31, 2026 18:55

fix: remove unused pytest import

af33ec9

style: format test file with black

d19a8d3

janhilgard approved these changes Apr 1, 2026

View reviewed changes

This was referenced Apr 7, 2026

fix: MLLM hybrid model batching (ArraysCache support) #241

Closed

Segfault on long context with unrecognized message role (developer) #137

Closed

server: add OpenAI-compatible /v1/responses endpoint #214

Merged

Thump604 merged commit 9fe4d3f into waybarrios:main Apr 11, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalize messages before chat template application#240

fix: normalize messages before chat template application#240
Thump604 merged 3 commits intowaybarrios:mainfrom
Thump604:fix/normalize-messages

Thump604 commented Mar 31, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Thump604 commented Apr 9, 2026 •

edited

Loading

Uh oh!

janhilgard commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Mar 31, 2026

Summary

Behavior

Test plan

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Thump604 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janhilgard commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Thump604 commented Apr 9, 2026 •

edited

Loading