Fix chunked prefill crash with mlx-lm >= 0.31.0#156
Fix chunked prefill crash with mlx-lm >= 0.31.0#156omprashantjain wants to merge 1 commit intowaybarrios:mainfrom
Conversation
mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server") added a `prompt_checkpoints` field to the prompt tuples returned by BatchGenerator.unprocessed_prompts. This causes the zip(*batch_prompts) unpacking in _chunked_next() to fail with a ValueError when using --continuous-batching, since the code expects exactly 6 elements but now receives 7. Add a *_extra catch-all to the tuple unpacking so it gracefully handles any additional fields from upstream mlx-lm changes. This is backward- compatible — older mlx-lm versions that return 6 elements still work fine (the *_extra will simply be empty). Fixes waybarrios#155
|
@janhilgard could you review this PR? |
janhilgard
left a comment
There was a problem hiding this comment.
LGTM — clean, minimal, correct fix.
Verified completeness: There are 3 sites in scheduler.py where batch_prompts tuples are unpacked:
| Site | Line | Pattern | Status |
|---|---|---|---|
| A | ~361 | len(p[1]) — index-based access |
Safe |
| B | ~367 | for _uid, _toks, *_ in batch_prompts — already has catch-all |
Safe |
| C | ~396 | zip(*batch_prompts) — hardcoded 6 vars |
Fixed by this PR |
This is the only vulnerable site. The *_extra catch-all is idiomatic Python and provides forward compatibility — works with both mlx-lm < 0.31.0 (6 fields) and >= 0.31.0 (7+ fields).
The new prompt_checkpoints field is for mlx-lm's internal agentic cache management and can be safely ignored by vllm-mlx, which handles prompt processing end-to-end.
No changes needed to pyproject.toml — the existing mlx-lm>=0.30.5 constraint is correct since the fix is backward compatible.
Adds *_rest to zip(*batch_prompts) unpacking to handle 7-element tuples.
|
This is already fixed on main. PR #183 landed the same unpack fix using an explicit # current main (scheduler.py line 388-396)
(
uids,
inputs_raw,
max_tokens_list,
caches,
samplers,
logits_processors,
_prompt_checkpoints,
) = zip(*batch_prompts)PR #183 also fixed the |
Summary
Fixes #155 — chunked prefill crashes with
mlx-lm >= 0.31.0becausezip(*batch_prompts)unpacking expects 6 tuple elements but now receives 7.Root Cause
mlx-lm PR ml-explore/mlx-lm#911 ("Better caching in the server") added a
prompt_checkpointsfield to the prompt tuples returned byBatchGenerator.unprocessed_prompts. The hardcoded 6-variable unpacking inscheduler.py:_chunked_next()then fails with aValueError.Fix
Add
*_extracatch-all to the tuple unpacking:( uids, inputs_raw, max_tokens_list, caches, samplers, logits_processors, *_extra, # ← new: absorb extra fields from mlx-lm ) = zip(*batch_prompts)This is:
*_extrais empty)Testing
Tested on two Mac Studio Ultras (M2 Ultra, 192GB) with:
vllm-mlx==0.2.6,mlx-lm==0.31.1,mlx==0.31.1--continuous-batchingenabled