fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)#183
Conversation
mlx-lm 0.31.0 added prompt_checkpoints support, changing the BatchGenerator.insert() tuple from 6 elements to 7. This causes "ValueError: too many values to unpack (expected 6)" in _chunked_next when processing any request. Changes: - scheduler.py line ~395: unpack 7 values (add _prompt_checkpoints) - scheduler.py line ~406: pass max_kv_size=None to _make_cache() (signature changed in mlx-lm 0.31.0 to require 3 args) Tested on Mac Mini M4 Pro 64GB with: - mlx-lm 0.31.0 - mlx 0.31.1 - Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit - vllm-mlx 0.2.5 (this fork) Fixes the same issue as jundot/omlx#110.
|
This is the same crash fixed in #169, which also includes hybrid cache handling for MoE models (Qwen3.5-122B/35B) in If #169 lands first this becomes redundant; if this lands first #169 just needs a minor rebase. No conflict either way — just flagging the overlap. |
|
Thanks for flagging @Thump604! I see #169 is already merged. If it covers the scheduler.py _chunked_next unpack as well, happy to close this as redundant. My fix was specifically for scheduler.py line ~388 (the batch prompt tuple) and _make_cache signature — want to confirm #169 addresses both before closing. |
|
Hey @hkstrongside — #169 is closed. It was split into:
Your Options:
Either way works — just want to make sure the |
|
We ran into this same issue and independently arrived at the same fix. One small suggestion: we used prompt_cache = _make_cache(self.model, padding, self.max_kv_size)
|
Respects user-configured KV cache limits on the chunked prefill path instead of silently defaulting to None. Credit: @dougborg for catching this.
|
Thanks for breaking that down. Good to know the |
|
Good catch @dougborg, that's a better approach. |
waybarrios
left a comment
There was a problem hiding this comment.
Both fixes are needed and neither is covered elsewhere. PR #194 (the unpack fix) was closed without merge, so the _prompt_checkpoints unpack is still missing on main. The _make_cache signature fix is also not covered by any other PR.
Good call from @dougborg on using self.max_kv_size instead of None, that respects user configured KV cache limits on the chunked prefill path.
Merging this first, then #227 bumps the version floor to match.
Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160), platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227), base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109), and cleanup commits. Conflicts resolved: - scheduler.py: keep make_logits_processors import (fork feature) - mllm_scheduler.py: take upstream stop-token skip in detokenizer - models/mllm.py: keep SHA256 hash (fork fix for collision) - utils/tokenizer.py: merge upstream error message with fork elif chain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…heduler-compat fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)
Summary
mlx-lm 0.31.0 added
prompt_checkpointssupport, changing theBatchGenerator.insert()tuple from 6 elements to 7. This causesValueError: too many values to unpack (expected 6)in_chunked_nextwhen processing any request.Changes
scheduler.py~line 395: unpack 7 values instead of 6 (add_prompt_checkpoints)scheduler.py~line 406: passmax_kv_size=Noneto_make_cache()(signature changed in mlx-lm 0.31.0 to require 3 positional args)Error Before Fix
Tested On
Related