Skip to content

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching#227

Merged
waybarrios merged 1 commit intowaybarrios:mainfrom
computor-org:fix/bump-mlx-lm-for-hybrid-batching
Mar 31, 2026
Merged

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching#227
waybarrios merged 1 commit intowaybarrios:mainfrom
computor-org:fix/bump-mlx-lm-for-hybrid-batching

Conversation

@krystophny
Copy link
Copy Markdown
Contributor

Summary

Bump the mlx-lm minimum version from >=0.30.5 to >=0.31.0.

Why

ArraysCache gained native batching support (extract, merge, filter, prepare) in mlx-lm 0.31.0. Older versions crash with ArraysCache.__init__() missing 1 required positional argument: 'size' when continuous batching encounters hybrid models like Qwen3.5 that mix KVCache and ArraysCache layers.

The ensure_mamba_support() monkey-patch is already correctly disabled since these methods are native. The only missing piece was the version floor in pyproject.toml.

Reproduction

# With mlx-lm < 0.31.0:
vllm-mlx serve mlx-community/Qwen3.5-35B-A3B-6bit --continuous-batching True
# ERROR: ArraysCache.__init__() missing 1 required positional argument: 'size'

Verification

python -c "from mlx_lm.models.cache import ArraysCache; print(hasattr(ArraysCache, 'extract'), hasattr(ArraysCache, 'merge'))"
# True True (on 0.31.0+)

Files

  • pyproject.toml

Fixes computor-org#11
Related: #160, #159

ArraysCache gained native batching support (extract, merge, filter,
prepare) in mlx-lm 0.31.0. Older versions crash with
"ArraysCache.__init__() missing 1 required positional argument: 'size'"
when continuous batching encounters hybrid models like Qwen3.5 that
mix KVCache and ArraysCache layers.

Fixes #11
Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #183 landed with the scheduler fixes for mlx-lm 0.31.x, this version bump is the matching piece. Bumping the floor to 0.31.0 prevents users from installing older versions that are incompatible with the current codebase (ArraysCache native batching, _make_cache 3-arg signature, prompt_checkpoints tuple).

@waybarrios waybarrios merged commit 7b0fc7f into waybarrios:main Mar 31, 2026
janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Apr 1, 2026
Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160),
platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227),
base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109),
and cleanup commits.

Conflicts resolved:
- scheduler.py: keep make_logits_processors import (fork feature)
- mllm_scheduler.py: take upstream stop-token skip in detokenizer
- models/mllm.py: keep SHA256 hash (fork fix for collision)
- utils/tokenizer.py: merge upstream error message with fork elif chain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sysit pushed a commit to sysit/vllm-mlx that referenced this pull request Apr 1, 2026
…or-hybrid-batching

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Engine loop error: ArraysCache.__init__() missing 1 required positional argument: 'size' when enabling --continuous-batching or running bench

2 participants