Skip to content

Fix BatchMambaCache missing size arg for mlx-lm >= 0.30.6#89

Merged
waybarrios merged 1 commit intomainfrom
fix/batch-mamba-cache-arrays-size
Feb 15, 2026
Merged

Fix BatchMambaCache missing size arg for mlx-lm >= 0.30.6#89
waybarrios merged 1 commit intomainfrom
fix/batch-mamba-cache-arrays-size

Conversation

@waybarrios
Copy link
Copy Markdown
Owner

mlx-lm 0.30.6 removed MambaCache. The fallback to ArraysCache works but ArraysCache requires a positional size argument that was never passed, causing every inference request to fail with:

ArraysCache.init() missing 1 required positional argument: 'size'

Fixed by passing size conditionally in init and extract when HAS_MAMBA_CACHE is False.

Changes in vllm_mlx/utils/mamba_cache.py:

init now accepts size=2 and passes it to ArraysCache:

def init(self, left_padding=None, size=2):
if HAS_MAMBA_CACHE:
super().init(left_padding=left_padding)
else:
super().init(size=size, left_padding=left_padding)

extract reads size from the existing cache:

size = len(self.cache)
if HAS_MAMBA_CACHE:
cache = MambaCache()
else:
cache = MambaCache(size=size)

merge and _patched_make_cache need no changes since size=2 default covers them.

Closes #87

@waybarrios waybarrios self-assigned this Feb 15, 2026
@waybarrios waybarrios added the bug Something isn't working label Feb 15, 2026
@waybarrios waybarrios merged commit 9daa89d into main Feb 15, 2026
7 checks passed
sooth pushed a commit to sooth/vllm-mlx that referenced this pull request Feb 27, 2026
Merge 17 upstream commits including:
- KV cache quantization for prefix cache memory reduction (waybarrios#62)
- Streaming tool call parsing via ToolParser integration (waybarrios#46)
- MTP speculative decoding for Qwen3-Next (waybarrios#82)
- GPT-OSS reasoning parser and Harmony format parsers
- mlx-lm >= 0.30.5 requirement, transformers >= 5.0.0
- BatchMambaCache fix for mlx-lm >= 0.30.6 (waybarrios#89)
- MLLM continuous batching fixes (waybarrios#76)
- Force MLLM mode option (waybarrios#81)
- Various bug fixes

Conflict resolution:
- server.py: Replaced local tool_call_buffering with upstream's
  ToolParser-based streaming (more robust)
- cli.py: Deduplicated --mllm, --default-temperature, --default-top-p
  args (upstream already added them), kept local --embedding-model
- mamba_cache.py: Took upstream's conditional HAS_MAMBA_CACHE approach
- pyproject.toml: Took upstream's version and dependency changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BatchMambaCache incompatible with mlx-lm >= 0.30.6 (MambaCache removed)

1 participant