Fix chunked prefill crash with mlx-lm >= 0.31.0 by omprashantjain · Pull Request #156 · waybarrios/vllm-mlx

omprashantjain · 2026-03-12T07:08:24Z

Summary

Fixes #155 — chunked prefill crashes with mlx-lm >= 0.31.0 because zip(*batch_prompts) unpacking expects 6 tuple elements but now receives 7.

Root Cause

mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server") added a prompt_checkpoints field to the prompt tuples returned by BatchGenerator.unprocessed_prompts. The hardcoded 6-variable unpacking in scheduler.py:_chunked_next() then fails with a ValueError.

Fix

Add *_extra catch-all to the tuple unpacking:

(
    uids,
    inputs_raw,
    max_tokens_list,
    caches,
    samplers,
    logits_processors,
    *_extra,                 # ← new: absorb extra fields from mlx-lm
) = zip(*batch_prompts)

This is:

Forward-compatible — handles any future fields mlx-lm might add
Backward-compatible — older mlx-lm versions that return 6 elements work fine (*_extra is empty)
Minimal — 1 line added, no behavioral changes

Testing

Tested on two Mac Studio Ultras (M2 Ultra, 192GB) with:

vllm-mlx==0.2.6, mlx-lm==0.31.1, mlx==0.31.1
Models: Qwen3.5-35B-A3B-8bit, Qwen3-14B
--continuous-batching enabled
30 concurrent heavy requests (2048 max_tokens each) — all succeeded
Aggregate throughput: ~416 tok/s across concurrent requests
Stable over 48+ hours of continuous operation

mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server") added a `prompt_checkpoints` field to the prompt tuples returned by BatchGenerator.unprocessed_prompts. This causes the zip(*batch_prompts) unpacking in _chunked_next() to fail with a ValueError when using --continuous-batching, since the code expects exactly 6 elements but now receives 7. Add a *_extra catch-all to the tuple unpacking so it gracefully handles any additional fields from upstream mlx-lm changes. This is backward- compatible — older mlx-lm versions that return 6 elements still work fine (the *_extra will simply be empty). Fixes waybarrios#155

waybarrios · 2026-03-12T19:50:18Z

@janhilgard could you review this PR?

janhilgard

LGTM — clean, minimal, correct fix.

Verified completeness: There are 3 sites in scheduler.py where batch_prompts tuples are unpacked:

Site	Line	Pattern	Status
A	~361	`len(p[1])` — index-based access	Safe
B	~367	`for _uid, _toks, *_ in batch_prompts` — already has catch-all	Safe
C	~396	`zip(*batch_prompts)` — hardcoded 6 vars	Fixed by this PR

This is the only vulnerable site. The *_extra catch-all is idiomatic Python and provides forward compatibility — works with both mlx-lm < 0.31.0 (6 fields) and >= 0.31.0 (7+ fields).

The new prompt_checkpoints field is for mlx-lm's internal agentic cache management and can be safely ignored by vllm-mlx, which handles prompt processing end-to-end.

No changes needed to pyproject.toml — the existing mlx-lm>=0.30.5 constraint is correct since the fix is backward compatible.

Adds *_rest to zip(*batch_prompts) unpacking to handle 7-element tuples.

waybarrios · 2026-03-31T17:50:14Z

This is already fixed on main. PR #183 landed the same unpack fix using an explicit _prompt_checkpoints variable instead of *_extra:

# current main (scheduler.py line 388-396)
(
    uids,
    inputs_raw,
    max_tokens_list,
    caches,
    samplers,
    logits_processors,
    _prompt_checkpoints,
) = zip(*batch_prompts)

PR #183 also fixed the _make_cache() signature change (3 args in mlx-lm 0.31.0). Closing as redundant.

waybarrios requested a review from janhilgard March 12, 2026 19:50

waybarrios assigned janhilgard Mar 12, 2026

waybarrios added bug Something isn't working UNDER REVIEW labels Mar 12, 2026

janhilgard approved these changes Mar 12, 2026

View reviewed changes

seanpianka added a commit to seanpianka/vllm-mlx that referenced this pull request Mar 14, 2026

fix: chunked prefill crash with mlx-lm >= 0.31.0 (PR waybarrios#156)

015b96a

Adds *_rest to zip(*batch_prompts) unpacking to handle 7-element tuples.

Thump604 mentioned this pull request Mar 25, 2026

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194

Closed

4 tasks

waybarrios closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chunked prefill crash with mlx-lm >= 0.31.0#156

Fix chunked prefill crash with mlx-lm >= 0.31.0#156
omprashantjain wants to merge 1 commit intowaybarrios:mainfrom
omprashantjain:fix/chunked-prefill-mlx-lm-031

omprashantjain commented Mar 12, 2026

Uh oh!

waybarrios commented Mar 12, 2026

Uh oh!

janhilgard left a comment

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

omprashantjain commented Mar 12, 2026

Summary

Root Cause

Fix

Testing

Uh oh!

waybarrios commented Mar 12, 2026

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants