Add mid-batch extend for text-only MLLM requests by janhilgard · Pull Request #263 · waybarrios/vllm-mlx

janhilgard · 2026-04-06T22:06:06Z

Summary

MLLM batch generator previously only started new batches when num_active == 0, serializing all concurrent requests. A short text-only request had to wait for a long request to finish generating all its tokens.
Now text-only requests (no images/videos) can join an active batch mid-generation via MLLMBatch.extend(). Multimodal requests still wait for batch completion due to vision encoding shape constraints.
This leverages the existing MLLMBatch.extend() method and BatchKVCache.extend() — no new infrastructure needed.

Before

Request 1 (long):  |===prefill===|=========generation (2000 tokens)=========|
Request 2 (short): |                    waiting...                           |===prefill+gen===|

After

Request 1 (long):  |===prefill===|=========generation (2000 tokens)=========|
Request 2 (short):        |==prefill+gen==| done!

Tested on Qwen3.5-35B-A3B (--mllm --continuous-batching)

Long request: 53.8s (2000 max_tokens)
Short request: 1.0s (joined mid-batch, finished 48.8s earlier)
Both outputs correct, no artifacts

Test plan

Concurrent text-only requests: short request joins active batch and finishes independently
Output correctness: both requests produce coherent text
Multimodal requests still wait for num_active == 0 (no regression)
Edge case: _process_prompts returns None (handled, no extend called)

🤖 Generated with Claude Code

Previously, MLLM batch generator only started new batches when num_active == 0, serializing concurrent requests. A short text-only request had to wait for a long request to finish all its tokens. Now text-only requests (no images/videos) can join an active batch mid-generation via MLLMBatch.extend(). Multimodal requests still wait for batch completion due to vision encoding shape constraints. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

janhilgard · 2026-04-11T14:19:20Z

@Thump604 Superseded — mid-batch extend for text-only MLLM requests is already in main via #278. Closing.

janhilgard closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mid-batch extend for text-only MLLM requests#263

Add mid-batch extend for text-only MLLM requests#263
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:feat/mllm-mid-batch-extend

janhilgard commented Apr 6, 2026

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

janhilgard commented Apr 6, 2026

Summary

Before

After

Tested on Qwen3.5-35B-A3B (--mllm --continuous-batching)

Test plan

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant