Skip to content

Fix chunked prefill crash with mlx-lm >= 0.31.0#156

Closed
omprashantjain wants to merge 1 commit intowaybarrios:mainfrom
omprashantjain:fix/chunked-prefill-mlx-lm-031
Closed

Fix chunked prefill crash with mlx-lm >= 0.31.0#156
omprashantjain wants to merge 1 commit intowaybarrios:mainfrom
omprashantjain:fix/chunked-prefill-mlx-lm-031

Conversation

@omprashantjain
Copy link
Copy Markdown

Summary

Fixes #155 — chunked prefill crashes with mlx-lm >= 0.31.0 because zip(*batch_prompts) unpacking expects 6 tuple elements but now receives 7.

Root Cause

mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server") added a prompt_checkpoints field to the prompt tuples returned by BatchGenerator.unprocessed_prompts. The hardcoded 6-variable unpacking in scheduler.py:_chunked_next() then fails with a ValueError.

Fix

Add *_extra catch-all to the tuple unpacking:

(
    uids,
    inputs_raw,
    max_tokens_list,
    caches,
    samplers,
    logits_processors,
    *_extra,                 # ← new: absorb extra fields from mlx-lm
) = zip(*batch_prompts)

This is:

  • Forward-compatible — handles any future fields mlx-lm might add
  • Backward-compatible — older mlx-lm versions that return 6 elements work fine (*_extra is empty)
  • Minimal — 1 line added, no behavioral changes

Testing

Tested on two Mac Studio Ultras (M2 Ultra, 192GB) with:

  • vllm-mlx==0.2.6, mlx-lm==0.31.1, mlx==0.31.1
  • Models: Qwen3.5-35B-A3B-8bit, Qwen3-14B
  • --continuous-batching enabled
  • 30 concurrent heavy requests (2048 max_tokens each) — all succeeded
  • Aggregate throughput: ~416 tok/s across concurrent requests
  • Stable over 48+ hours of continuous operation

mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server")
added a `prompt_checkpoints` field to the prompt tuples returned by
BatchGenerator.unprocessed_prompts. This causes the zip(*batch_prompts)
unpacking in _chunked_next() to fail with a ValueError when using
--continuous-batching, since the code expects exactly 6 elements but
now receives 7.

Add a *_extra catch-all to the tuple unpacking so it gracefully handles
any additional fields from upstream mlx-lm changes. This is backward-
compatible — older mlx-lm versions that return 6 elements still work
fine (the *_extra will simply be empty).

Fixes waybarrios#155
@waybarrios waybarrios requested a review from janhilgard March 12, 2026 19:50
@waybarrios
Copy link
Copy Markdown
Owner

@janhilgard could you review this PR?

@waybarrios waybarrios added bug Something isn't working UNDER REVIEW labels Mar 12, 2026
Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, minimal, correct fix.

Verified completeness: There are 3 sites in scheduler.py where batch_prompts tuples are unpacked:

Site Line Pattern Status
A ~361 len(p[1]) — index-based access Safe
B ~367 for _uid, _toks, *_ in batch_prompts — already has catch-all Safe
C ~396 zip(*batch_prompts) — hardcoded 6 vars Fixed by this PR

This is the only vulnerable site. The *_extra catch-all is idiomatic Python and provides forward compatibility — works with both mlx-lm < 0.31.0 (6 fields) and >= 0.31.0 (7+ fields).

The new prompt_checkpoints field is for mlx-lm's internal agentic cache management and can be safely ignored by vllm-mlx, which handles prompt processing end-to-end.

No changes needed to pyproject.toml — the existing mlx-lm>=0.30.5 constraint is correct since the fix is backward compatible.

seanpianka added a commit to seanpianka/vllm-mlx that referenced this pull request Mar 14, 2026
Adds *_rest to zip(*batch_prompts) unpacking to handle 7-element tuples.
@waybarrios
Copy link
Copy Markdown
Owner

This is already fixed on main. PR #183 landed the same unpack fix using an explicit _prompt_checkpoints variable instead of *_extra:

# current main (scheduler.py line 388-396)
(
    uids,
    inputs_raw,
    max_tokens_list,
    caches,
    samplers,
    logits_processors,
    _prompt_checkpoints,
) = zip(*batch_prompts)

PR #183 also fixed the _make_cache() signature change (3 args in mlx-lm 0.31.0). Closing as redundant.

@waybarrios waybarrios closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working UNDER REVIEW

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chunked prefill crashes with mlx-lm >= 0.31.0: zip unpacking expects 6 values, gets 7

3 participants