Update Mamba cache to support ArraysCache by solarpunkin · Pull Request #94 · vllm-project/vllm-metal

solarpunkin · 2026-02-08T07:44:51Z

Addresses changes introduced in ml-explore/mlx-lm#842, which deprecated MambaCache for ArraysCache in inference loop. Removed BatchMambaCache class as ArraysCache now supports merge() and extract(). Older mlx-lm versions still support MambaCache.

Tests

Unit Tests: All prefix cache guard tests passed after updating sampling_params mock to include missing attributes (logprobs, seed, logits_bias, logits_processors).
Linter/Type Check: ruff and mypy checks passed.

Signed-off-by: gaurav <gaurav290802@gmail.com>

ericcurtin · 2026-02-09T16:07:13Z

This seems reasonably fine. But we should change to a model where we strictly define what mlx versions we use vllm-metal side. So we are deliberate in what version of mlx we use at any point in time in the commit history.

solarpunkin · 2026-02-09T19:09:51Z

The deps used here

accelerate==1.12.0
fastapi==0.128.5
mlx==0.30.6
mlx-lm==0.29.1
mlx-vlm==0.3.9
mypy==1.19.1
numpy==2.2.6
psutil==7.2.2
pydantic==2.12.5
pytest==9.0.2
pytest-asyncio==1.3.0
ruff==0.15.0
safetensors==0.7.0
transformers==4.57.6
uvicorn==0.40.0
vllm==0.14.1

were installed with pip install -e ".[all]"
What minimum versions should be the target here as ArraysCache is already out-of-sync with the minimal required version in pyproject.toml

Signed-off-by: gaurav <gaurav290802@gmail.com>

LxYuan0420

Please fix version mismatch issue (or bump the min vLLM + CI)

Also, the hybrid cache change is broken for current mlx-lm: ArraysCache has no merge/extract, so _merge_kv_caches will fail at runtime

LxYuan0420 · 2026-02-11T08:09:33Z

-            if self.cache[i] is not None:
-                cache.cache[i] = self.cache[i][idx : idx + 1]
-        return cache
+AnyCache: TypeAlias = KVCache | ArraysCache | Any


this make the alias effectively type safety lost?

LxYuan0420 · 2026-02-11T08:13:00Z

-    """Check if a cache is a Mamba-style cache (ArraysCache or MambaCache)."""
-    return isinstance(cache, (MambaCache, ArraysCache))
+    """Check if a cache is a Mamba-style cache (has .cache attribute)."""
+    return hasattr(cache, "cache") and not isinstance(cache, (KVCache, BatchKVCache))


Prefer isinstance(cache, ArraysCache) since MambaCache is a subclass of ArraysCache in mlx-lm; this is clearer and avoids accidental matches

╰─➤ python Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from mlx_lm.models.cache import ArraysCache, MambaCache >>> print('issubclass:', issubclass(MambaCache, ArraysCache)) issubclass: True >>> print('isinstance:', isinstance(MambaCache(), ArraysCache)) isinstance: True

LxYuan0420 · 2026-02-11T08:15:47Z

 from vllm.logger import init_logger
 from vllm.lora.request import LoRARequest
-from vllm.model_executor import set_random_seed
+from vllm.utils.torch_utils import set_random_seed


importing set_random_seed from vllm.utils.torch_utils breaks on vLLM 0.13.0 (our declared minimum); either keep compatibility with the version we declared or bumped the min version to v0.14.0

LxYuan0420 · 2026-02-11T08:17:35Z

+from vllm.v1.attention.backends.registry import AttentionBackendEnum
+from vllm.v1.attention.selector import AttentionSelectorConfig


vllm.v1.attention.* imports in vllm_metal/platform.py and tests/test_platform.py do not exist in vLLM 0.13.0; it only exists in 0.14.0?

LxYuan0420 · 2026-02-11T08:19:26Z

+        elif hasattr(cache, "cache") and not isinstance(cache, BatchKVCache):
+            # Fallback for older ArraysCache/MambaCache versions where .extract is missing
+            new_cache = type(cache)(len(cache.cache))
+            new_cache.cache = [c[idx : idx + 1] for c in cache.cache]
+            extracted.append(new_cache)


hmm... can you confirm whether cache can include None entries in this path (hybrid/Mamba/ArraysCache)? The fallback list comprehension slices every entry, which would crash on None. If None is possible, we should preserve those entries in the fallback (or add a test showing it cant happen)

solarpunkin · 2026-02-21T17:52:07Z

I see this getting addressed in #110 so it would be better to close this one.

[PS: I took a lil break from coding due to fatigue]

Refactor Mamba cache to support ArraysCache and older mlx-lm versions

9f17ec3

Signed-off-by: gaurav <gaurav290802@gmail.com>

Merge branch 'main' into fix/mamba-cache-refactor

6bb7511

solarpunkin force-pushed the fix/mamba-cache-refactor branch 2 times, most recently from 55109b4 to c552b48 Compare February 10, 2026 16:15

fix Importerror

d46730b

Signed-off-by: gaurav <gaurav290802@gmail.com>

solarpunkin force-pushed the fix/mamba-cache-refactor branch from c552b48 to d46730b Compare February 10, 2026 16:16

LxYuan0420 requested changes Feb 11, 2026

View reviewed changes

chattytak mentioned this pull request Feb 14, 2026

"Failed to import from vllm._C" and "ImportError: cannot import name 'MambaCache' from 'mlx_lm.models.cache'" #100

Closed

solarpunkin closed this Feb 21, 2026

solarpunkin deleted the fix/mamba-cache-refactor branch February 21, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Mamba cache to support ArraysCache #94

Update Mamba cache to support ArraysCache #94
solarpunkin wants to merge 3 commits intovllm-project:mainfrom
solarpunkin:fix/mamba-cache-refactor

solarpunkin commented Feb 8, 2026

Uh oh!

ericcurtin commented Feb 9, 2026

Uh oh!

solarpunkin commented Feb 9, 2026 •

edited

Loading

Uh oh!

LxYuan0420 left a comment

Uh oh!

LxYuan0420 Feb 11, 2026

Uh oh!

LxYuan0420 Feb 11, 2026

Uh oh!

LxYuan0420 Feb 11, 2026

Uh oh!

LxYuan0420 Feb 11, 2026

Uh oh!

LxYuan0420 Feb 11, 2026

Uh oh!

solarpunkin commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from vllm.v1.attention.backends.registry import AttentionBackendEnum
		from vllm.v1.attention.selector import AttentionSelectorConfig

Conversation

solarpunkin commented Feb 8, 2026

Uh oh!

ericcurtin commented Feb 9, 2026

Uh oh!

solarpunkin commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LxYuan0420 left a comment

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

solarpunkin commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

solarpunkin commented Feb 9, 2026 •

edited

Loading