feat: BatchedEngine parity — MTP routing, normalization, SpecPrefill by Thump604 · Pull Request #203 · waybarrios/vllm-mlx

Thump604 · 2026-03-22T12:27:27Z

Replaces #192 (rebased against main after merge of #180, #97, #127).

Ports SimpleEngine features to BatchedEngine for continuous batching mode:

Per-request MTP routing: text-only → TextModel with MTP speculative decoding, media → MLLM. Zero-copy weight sharing from VLM backbone.
message_utils.py: Shared _normalize_messages() — maps developer→system, merges consecutive same-role messages, hoists system to position [0]. Required for Qwen 3.5 templates that reject malformed sequences.
SpecPrefill: Draft model lifecycle, CLI arg wiring, per-request API in BatchedEngine.
System KV cache: ChatML boundary detection, hash-based snapshot/restore.
Tests: MTP routing, TextModel construction, speculative decoding, smoke test.

Context

PR #180 (SpecPrefill) merged SimpleEngine support. This PR extends the same features to BatchedEngine, which is the production path for continuous batching mode.

Test plan

Start with --continuous-batching --enable-mtp --mllm
Text-only request routes to TextModel+MTP
Media request routes to MLLM
SpecPrefill activates on long prompts
System prompt cached across turns
_normalize_messages prevents template crashes on malformed input

Port SimpleEngine features to BatchedEngine for continuous batching: - Per-request MTP routing: text-only → TextModel (MTP), media → MLLM - message_utils.py: shared _normalize_messages (developer→system, merge consecutive same-role, hoist system to [0]) - SpecPrefill config + draft model lifecycle in BatchedEngine - System KV cache with ChatML boundary detection Replaces PR waybarrios#192 (rebased against main after merge of waybarrios#180, waybarrios#97).

Thump604 · 2026-03-23T00:21:47Z

Superseded by #204 (memory-aware scheduler includes BatchedEngine parity + admission control).

Thump604 mentioned this pull request Mar 22, 2026

feat: memory-aware admission controller for multi-user serving #204

Closed

6 tasks

Thump604 closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BatchedEngine parity — MTP routing, normalization, SpecPrefill#203

feat: BatchedEngine parity — MTP routing, normalization, SpecPrefill#203
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:feat/batched-engine-parity-v2

Thump604 commented Mar 22, 2026

Uh oh!

Thump604 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Thump604 commented Mar 22, 2026

Context

Test plan

Uh oh!

Thump604 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant