fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching by krystophny · Pull Request #227 · waybarrios/vllm-mlx

krystophny · 2026-03-25T23:57:49Z

Summary

Bump the mlx-lm minimum version from >=0.30.5 to >=0.31.0.

Why

ArraysCache gained native batching support (extract, merge, filter, prepare) in mlx-lm 0.31.0. Older versions crash with ArraysCache.__init__() missing 1 required positional argument: 'size' when continuous batching encounters hybrid models like Qwen3.5 that mix KVCache and ArraysCache layers.

The ensure_mamba_support() monkey-patch is already correctly disabled since these methods are native. The only missing piece was the version floor in pyproject.toml.

Reproduction

# With mlx-lm < 0.31.0:
vllm-mlx serve mlx-community/Qwen3.5-35B-A3B-6bit --continuous-batching True
# ERROR: ArraysCache.__init__() missing 1 required positional argument: 'size'

Verification

python -c "from mlx_lm.models.cache import ArraysCache; print(hasattr(ArraysCache, 'extract'), hasattr(ArraysCache, 'merge'))"
# True True (on 0.31.0+)

Files

pyproject.toml

Fixes computor-org#11
Related: #160, #159

ArraysCache gained native batching support (extract, merge, filter, prepare) in mlx-lm 0.31.0. Older versions crash with "ArraysCache.__init__() missing 1 required positional argument: 'size'" when continuous batching encounters hybrid models like Qwen3.5 that mix KVCache and ArraysCache layers. Fixes #11

waybarrios

Now that #183 landed with the scheduler fixes for mlx-lm 0.31.x, this version bump is the matching piece. Bumping the floor to 0.31.0 prevents users from installing older versions that are incompatible with the current codebase (ArraysCache native batching, _make_cache 3-arg signature, prompt_checkpoints tuple).

Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160), platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227), base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109), and cleanup commits. Conflicts resolved: - scheduler.py: keep make_logits_processors import (fork feature) - mllm_scheduler.py: take upstream stop-token skip in detokenizer - models/mllm.py: keep SHA256 hash (fork fix for collision) - utils/tokenizer.py: merge upstream error message with fork elif chain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or-hybrid-batching fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching

krystophny mentioned this pull request Mar 26, 2026

[Tracking] Upstream backlog and merge plan computor-org/vllm-mlx#12

Open

waybarrios mentioned this pull request Mar 31, 2026

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) #183

Merged

waybarrios approved these changes Mar 31, 2026

View reviewed changes

waybarrios merged commit 7b0fc7f into waybarrios:main Mar 31, 2026

sysit pushed a commit to sysit/vllm-mlx that referenced this pull request Apr 1, 2026

Merge pull request waybarrios#227 from computor-org/fix/bump-mlx-lm-f…

2ee19c6

…or-hybrid-batching fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching#227

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching#227
waybarrios merged 1 commit intowaybarrios:mainfrom
computor-org:fix/bump-mlx-lm-for-hybrid-batching

krystophny commented Mar 25, 2026

Uh oh!

waybarrios left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krystophny commented Mar 25, 2026

Summary

Why

Reproduction

Verification

Files

Uh oh!

waybarrios left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants