fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) by hkstrongside · Pull Request #183 · waybarrios/vllm-mlx

hkstrongside · 2026-03-20T07:03:52Z

Summary

mlx-lm 0.31.0 added prompt_checkpoints support, changing the BatchGenerator.insert() tuple from 6 elements to 7. This causes ValueError: too many values to unpack (expected 6) in _chunked_next when processing any request.

Changes

scheduler.py ~line 395: unpack 7 values instead of 6 (add _prompt_checkpoints)
scheduler.py ~line 406: pass max_kv_size=None to _make_cache() (signature changed in mlx-lm 0.31.0 to require 3 positional args)

Error Before Fix

File "vllm_mlx/scheduler.py", line 388, in _chunked_next
    (
ValueError: too many values to unpack (expected 6)

Tested On

Mac Mini M4 Pro 64GB, macOS 26.3
mlx-lm 0.31.0
mlx 0.31.1 / mlx-metal 0.31.1
transformers 5.3.0
Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit
Running as an OpenClaw backend with the blockops1 proxy

mlx-lm 0.31.0 added prompt_checkpoints support, changing the BatchGenerator.insert() tuple from 6 elements to 7. This causes "ValueError: too many values to unpack (expected 6)" in _chunked_next when processing any request. Changes: - scheduler.py line ~395: unpack 7 values (add _prompt_checkpoints) - scheduler.py line ~406: pass max_kv_size=None to _make_cache() (signature changed in mlx-lm 0.31.0 to require 3 args) Tested on Mac Mini M4 Pro 64GB with: - mlx-lm 0.31.0 - mlx 0.31.1 - Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit - vllm-mlx 0.2.5 (this fork) Fixes the same issue as jundot/omlx#110.

Thump604 · 2026-03-21T20:34:16Z

This is the same crash fixed in #169, which also includes hybrid cache handling for MoE models (Qwen3.5-122B/35B) in mllm_batch_generator.py, prefix cache support for hybrid architectures, and test coverage (3 test files).

If #169 lands first this becomes redundant; if this lands first #169 just needs a minor rebase. No conflict either way — just flagging the overlap.

hkstrongside · 2026-03-24T04:45:17Z

Thanks for flagging @Thump604! I see #169 is already merged. If it covers the scheduler.py _chunked_next unpack as well, happy to close this as redundant. My fix was specifically for scheduler.py line ~388 (the batch prompt tuple) and _make_cache signature — want to confirm #169 addresses both before closing.

Thump604 · 2026-03-24T12:58:48Z

Hey @hkstrongside — #169 is closed. It was split into:

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194: the _prompt_checkpoints unpack fix (same as your first hunk)
fix: MLLM continuous batching for hybrid models #165: hybrid cache + prefix cache changes

Your _make_cache signature fix is NOT covered by either. mlx-lm 0.31.0 changed _make_cache to require 3 args (model, left_padding, max_kv_size) — upstream scheduler.py still passes 2. So your PR has additional value.

Options:

Merge fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) #183 as-is (covers both fixes)
I can add the _make_cache fix to fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194

Either way works — just want to make sure the _make_cache fix doesn't get lost. @waybarrios thoughts?

dougborg · 2026-03-25T15:17:51Z

We ran into this same issue and independently arrived at the same fix. One small suggestion: we used self.max_kv_size instead of None for the _make_cache call:

prompt_cache = _make_cache(self.model, padding, self.max_kv_size)

None works since it's the default, but self.max_kv_size is set from the constructor arg and respects any user-configured KV cache limit. Passing None silently ignores that setting on the chunked prefill path.

@dougborg

Respects user-configured KV cache limits on the chunked prefill path instead of silently defaulting to None. Credit: @dougborg for catching this.

hkstrongside · 2026-03-27T17:45:49Z

Thanks for breaking that down. Good to know the _make_cache fix isn't covered elsewhere. Happy to keep this open for merge. If it's easier to fold into #194 that works too, whatever's cleanest for you guys.

hkstrongside · 2026-03-27T17:45:56Z

Good catch @dougborg, that's a better approach. self.max_kv_size respects the user config instead of silently defaulting. Just pushed the update. Thanks for flagging it.

waybarrios

Both fixes are needed and neither is covered elsewhere. PR #194 (the unpack fix) was closed without merge, so the _prompt_checkpoints unpack is still missing on main. The _make_cache signature fix is also not covered by any other PR.

Good call from @dougborg on using self.max_kv_size instead of None, that respects user configured KV cache limits on the chunked prefill path.

Merging this first, then #227 bumps the version floor to match.

Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160), platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227), base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109), and cleanup commits. Conflicts resolved: - scheduler.py: keep make_logits_processors import (fork feature) - mllm_scheduler.py: take upstream stop-token skip in detokenizer - models/mllm.py: keep SHA256 hash (fork fix for collision) - utils/tokenizer.py: merge upstream error message with fork elif chain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…heduler-compat fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)

Thump604 mentioned this pull request Mar 21, 2026

Contributing guidelines and PR review process #186

Open

This was referenced Mar 24, 2026

Fix successful MLX tokenizer loads computor-org/vllm-mlx#2

Merged

tokenizer: return successful mlx-lm load result #215

Open

Thump604 mentioned this pull request Mar 25, 2026

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 #194

Closed

4 tasks

fix: use self.max_kv_size instead of None in _make_cache call

63b999a

Respects user-configured KV cache limits on the chunked prefill path instead of silently defaulting to None. Credit: @dougborg for catching this.

waybarrios approved these changes Mar 31, 2026

View reviewed changes

waybarrios merged commit 4d8c21b into waybarrios:main Mar 31, 2026

waybarrios mentioned this pull request Mar 31, 2026

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching #227

Merged

waybarrios added a commit that referenced this pull request Mar 31, 2026

format scheduler.py _make_cache call from PR #183

ecfa8be

waybarrios mentioned this pull request Mar 31, 2026

Fix chunked prefill crash with mlx-lm >= 0.31.0 #156

Closed

sysit pushed a commit to sysit/vllm-mlx that referenced this pull request Apr 1, 2026

Merge pull request waybarrios#183 from hkstrongside/fix/mlx-lm-031-sc…

de6adb4

…heduler-compat fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)#183

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)#183
waybarrios merged 2 commits intowaybarrios:mainfrom
hkstrongside:fix/mlx-lm-031-scheduler-compat

hkstrongside commented Mar 20, 2026

Uh oh!

Thump604 commented Mar 21, 2026

Uh oh!

hkstrongside commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 24, 2026

Uh oh!

dougborg commented Mar 25, 2026

Uh oh!

hkstrongside commented Mar 27, 2026

Uh oh!

hkstrongside commented Mar 27, 2026

Uh oh!

waybarrios left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hkstrongside commented Mar 20, 2026

Summary

Changes

Error Before Fix

Tested On

Related

Uh oh!

Thump604 commented Mar 21, 2026

Uh oh!

hkstrongside commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 24, 2026

Uh oh!

dougborg commented Mar 25, 2026

Uh oh!

hkstrongside commented Mar 27, 2026

Uh oh!

hkstrongside commented Mar 27, 2026

Uh oh!

waybarrios left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants