Skip to content

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)#183

Merged
waybarrios merged 2 commits intowaybarrios:mainfrom
hkstrongside:fix/mlx-lm-031-scheduler-compat
Mar 31, 2026
Merged

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)#183
waybarrios merged 2 commits intowaybarrios:mainfrom
hkstrongside:fix/mlx-lm-031-scheduler-compat

Conversation

@hkstrongside
Copy link
Copy Markdown
Contributor

Summary

mlx-lm 0.31.0 added prompt_checkpoints support, changing the BatchGenerator.insert() tuple from 6 elements to 7. This causes ValueError: too many values to unpack (expected 6) in _chunked_next when processing any request.

Changes

  • scheduler.py ~line 395: unpack 7 values instead of 6 (add _prompt_checkpoints)
  • scheduler.py ~line 406: pass max_kv_size=None to _make_cache() (signature changed in mlx-lm 0.31.0 to require 3 positional args)

Error Before Fix

File "vllm_mlx/scheduler.py", line 388, in _chunked_next
    (
ValueError: too many values to unpack (expected 6)

Tested On

  • Mac Mini M4 Pro 64GB, macOS 26.3
  • mlx-lm 0.31.0
  • mlx 0.31.1 / mlx-metal 0.31.1
  • transformers 5.3.0
  • Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit
  • Running as an OpenClaw backend with the blockops1 proxy

Related

mlx-lm 0.31.0 added prompt_checkpoints support, changing the
BatchGenerator.insert() tuple from 6 elements to 7. This causes
"ValueError: too many values to unpack (expected 6)" in
_chunked_next when processing any request.

Changes:
- scheduler.py line ~395: unpack 7 values (add _prompt_checkpoints)
- scheduler.py line ~406: pass max_kv_size=None to _make_cache()
  (signature changed in mlx-lm 0.31.0 to require 3 args)

Tested on Mac Mini M4 Pro 64GB with:
- mlx-lm 0.31.0
- mlx 0.31.1
- Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit
- vllm-mlx 0.2.5 (this fork)

Fixes the same issue as jundot/omlx#110.
@Thump604
Copy link
Copy Markdown
Collaborator

This is the same crash fixed in #169, which also includes hybrid cache handling for MoE models (Qwen3.5-122B/35B) in mllm_batch_generator.py, prefix cache support for hybrid architectures, and test coverage (3 test files).

If #169 lands first this becomes redundant; if this lands first #169 just needs a minor rebase. No conflict either way — just flagging the overlap.

@hkstrongside
Copy link
Copy Markdown
Contributor Author

Thanks for flagging @Thump604! I see #169 is already merged. If it covers the scheduler.py _chunked_next unpack as well, happy to close this as redundant. My fix was specifically for scheduler.py line ~388 (the batch prompt tuple) and _make_cache signature — want to confirm #169 addresses both before closing.

@Thump604
Copy link
Copy Markdown
Collaborator

Hey @hkstrongside#169 is closed. It was split into:

Your _make_cache signature fix is NOT covered by either. mlx-lm 0.31.0 changed _make_cache to require 3 args (model, left_padding, max_kv_size) — upstream scheduler.py still passes 2. So your PR has additional value.

Options:

  1. Merge fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) #183 as-is (covers both fixes)
  2. I can add the _make_cache fix to fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194

Either way works — just want to make sure the _make_cache fix doesn't get lost. @waybarrios thoughts?

@dougborg
Copy link
Copy Markdown

We ran into this same issue and independently arrived at the same fix. One small suggestion: we used self.max_kv_size instead of None for the _make_cache call:

prompt_cache = _make_cache(self.model, padding, self.max_kv_size)

None works since it's the default, but self.max_kv_size is set from the constructor arg and respects any user-configured KV cache limit. Passing None silently ignores that setting on the chunked prefill path.

Respects user-configured KV cache limits on the chunked prefill path
instead of silently defaulting to None.

Credit: @dougborg for catching this.
@hkstrongside
Copy link
Copy Markdown
Contributor Author

Thanks for breaking that down. Good to know the _make_cache fix isn't covered elsewhere. Happy to keep this open for merge. If it's easier to fold into #194 that works too, whatever's cleanest for you guys.

@hkstrongside
Copy link
Copy Markdown
Contributor Author

Good catch @dougborg, that's a better approach. self.max_kv_size respects the user config instead of silently defaulting. Just pushed the update. Thanks for flagging it.

Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fixes are needed and neither is covered elsewhere. PR #194 (the unpack fix) was closed without merge, so the _prompt_checkpoints unpack is still missing on main. The _make_cache signature fix is also not covered by any other PR.

Good call from @dougborg on using self.max_kv_size instead of None, that respects user configured KV cache limits on the chunked prefill path.

Merging this first, then #227 bumps the version floor to match.

@waybarrios waybarrios merged commit 4d8c21b into waybarrios:main Mar 31, 2026
janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Apr 1, 2026
Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160),
platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227),
base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109),
and cleanup commits.

Conflicts resolved:
- scheduler.py: keep make_logits_processors import (fork feature)
- mllm_scheduler.py: take upstream stop-token skip in detokenizer
- models/mllm.py: keep SHA256 hash (fork fix for collision)
- utils/tokenizer.py: merge upstream error message with fork elif chain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sysit pushed a commit to sysit/vllm-mlx that referenced this pull request Apr 1, 2026
…heduler-compat

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants