Skip to content

Update Mamba cache to support ArraysCache #94

Closed
solarpunkin wants to merge 3 commits intovllm-project:mainfrom
solarpunkin:fix/mamba-cache-refactor
Closed

Update Mamba cache to support ArraysCache #94
solarpunkin wants to merge 3 commits intovllm-project:mainfrom
solarpunkin:fix/mamba-cache-refactor

Conversation

@solarpunkin
Copy link
Copy Markdown

Addresses changes introduced in ml-explore/mlx-lm#842, which deprecated MambaCache for ArraysCache in inference loop. Removed BatchMambaCache class as ArraysCache now supports merge() and extract(). Older mlx-lm versions still support MambaCache.

Tests

  • Unit Tests: All prefix cache guard tests passed after updating sampling_params mock to include missing attributes (logprobs, seed, logits_bias, logits_processors).
  • Linter/Type Check: ruff and mypy checks passed.

Signed-off-by: gaurav <gaurav290802@gmail.com>
@ericcurtin
Copy link
Copy Markdown
Collaborator

This seems reasonably fine. But we should change to a model where we strictly define what mlx versions we use vllm-metal side. So we are deliberate in what version of mlx we use at any point in time in the commit history.

@solarpunkin
Copy link
Copy Markdown
Author

solarpunkin commented Feb 9, 2026

The deps used here

accelerate==1.12.0
fastapi==0.128.5
mlx==0.30.6
mlx-lm==0.29.1
mlx-vlm==0.3.9
mypy==1.19.1
numpy==2.2.6
psutil==7.2.2
pydantic==2.12.5
pytest==9.0.2
pytest-asyncio==1.3.0
ruff==0.15.0
safetensors==0.7.0
transformers==4.57.6
uvicorn==0.40.0
vllm==0.14.1

were installed with pip install -e ".[all]"
What minimum versions should be the target here as ArraysCache is already out-of-sync with the minimal required version in pyproject.toml

@solarpunkin solarpunkin force-pushed the fix/mamba-cache-refactor branch 2 times, most recently from 55109b4 to c552b48 Compare February 10, 2026 16:15
Signed-off-by: gaurav <gaurav290802@gmail.com>
@solarpunkin solarpunkin force-pushed the fix/mamba-cache-refactor branch from c552b48 to d46730b Compare February 10, 2026 16:16
Copy link
Copy Markdown
Collaborator

@LxYuan0420 LxYuan0420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix version mismatch issue (or bump the min vLLM + CI)

Also, the hybrid cache change is broken for current mlx-lm: ArraysCache has no merge/extract, so _merge_kv_caches will fail at runtime

if self.cache[i] is not None:
cache.cache[i] = self.cache[i][idx : idx + 1]
return cache
AnyCache: TypeAlias = KVCache | ArraysCache | Any
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make the alias effectively type safety lost?

"""Check if a cache is a Mamba-style cache (ArraysCache or MambaCache)."""
return isinstance(cache, (MambaCache, ArraysCache))
"""Check if a cache is a Mamba-style cache (has .cache attribute)."""
return hasattr(cache, "cache") and not isinstance(cache, (KVCache, BatchKVCache))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer isinstance(cache, ArraysCache) since MambaCache is a subclass of ArraysCache in mlx-lm; this is clearer and avoids accidental matches

╰─➤  python
Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mlx_lm.models.cache import ArraysCache, MambaCache
>>> print('issubclass:', issubclass(MambaCache, ArraysCache))
issubclass: True
>>> print('isinstance:', isinstance(MambaCache(), ArraysCache))
isinstance: True

Comment thread vllm_metal/v1/worker.py
from vllm.logger import init_logger
from vllm.lora.request import LoRARequest
from vllm.model_executor import set_random_seed
from vllm.utils.torch_utils import set_random_seed
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importing set_random_seed from vllm.utils.torch_utils breaks on vLLM 0.13.0 (our declared minimum); either keep compatibility with the version we declared or bumped the min version to v0.14.0

Comment thread tests/test_platform.py
Comment on lines +8 to +9
from vllm.v1.attention.backends.registry import AttentionBackendEnum
from vllm.v1.attention.selector import AttentionSelectorConfig
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm.v1.attention.* imports in vllm_metal/platform.py and tests/test_platform.py do not exist in vLLM 0.13.0; it only exists in 0.14.0?

Comment on lines +391 to +395
elif hasattr(cache, "cache") and not isinstance(cache, BatchKVCache):
# Fallback for older ArraysCache/MambaCache versions where .extract is missing
new_cache = type(cache)(len(cache.cache))
new_cache.cache = [c[idx : idx + 1] for c in cache.cache]
extracted.append(new_cache)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... can you confirm whether cache can include None entries in this path (hybrid/Mamba/ArraysCache)? The fallback list comprehension slices every entry, which would crash on None. If None is possible, we should preserve those entries in the fallback (or add a test showing it cant happen)

@solarpunkin
Copy link
Copy Markdown
Author

I see this getting addressed in #110 so it would be better to close this one.

[PS: I took a lil break from coding due to fatigue]

@solarpunkin solarpunkin deleted the fix/mamba-cache-refactor branch February 21, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants