Update Mamba cache to support ArraysCache #94
Update Mamba cache to support ArraysCache #94solarpunkin wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: gaurav <gaurav290802@gmail.com>
|
This seems reasonably fine. But we should change to a model where we strictly define what mlx versions we use vllm-metal side. So we are deliberate in what version of mlx we use at any point in time in the commit history. |
|
The deps used here accelerate==1.12.0
fastapi==0.128.5
mlx==0.30.6
mlx-lm==0.29.1
mlx-vlm==0.3.9
mypy==1.19.1
numpy==2.2.6
psutil==7.2.2
pydantic==2.12.5
pytest==9.0.2
pytest-asyncio==1.3.0
ruff==0.15.0
safetensors==0.7.0
transformers==4.57.6
uvicorn==0.40.0
vllm==0.14.1were installed with |
55109b4 to
c552b48
Compare
Signed-off-by: gaurav <gaurav290802@gmail.com>
c552b48 to
d46730b
Compare
LxYuan0420
left a comment
There was a problem hiding this comment.
Please fix version mismatch issue (or bump the min vLLM + CI)
Also, the hybrid cache change is broken for current mlx-lm: ArraysCache has no merge/extract, so _merge_kv_caches will fail at runtime
| if self.cache[i] is not None: | ||
| cache.cache[i] = self.cache[i][idx : idx + 1] | ||
| return cache | ||
| AnyCache: TypeAlias = KVCache | ArraysCache | Any |
There was a problem hiding this comment.
this make the alias effectively type safety lost?
| """Check if a cache is a Mamba-style cache (ArraysCache or MambaCache).""" | ||
| return isinstance(cache, (MambaCache, ArraysCache)) | ||
| """Check if a cache is a Mamba-style cache (has .cache attribute).""" | ||
| return hasattr(cache, "cache") and not isinstance(cache, (KVCache, BatchKVCache)) |
There was a problem hiding this comment.
Prefer isinstance(cache, ArraysCache) since MambaCache is a subclass of ArraysCache in mlx-lm; this is clearer and avoids accidental matches
╰─➤ python
Python 3.12.7 (main, Oct 16 2024, 07:12:08) [Clang 18.1.8 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mlx_lm.models.cache import ArraysCache, MambaCache
>>> print('issubclass:', issubclass(MambaCache, ArraysCache))
issubclass: True
>>> print('isinstance:', isinstance(MambaCache(), ArraysCache))
isinstance: True| from vllm.logger import init_logger | ||
| from vllm.lora.request import LoRARequest | ||
| from vllm.model_executor import set_random_seed | ||
| from vllm.utils.torch_utils import set_random_seed |
There was a problem hiding this comment.
importing set_random_seed from vllm.utils.torch_utils breaks on vLLM 0.13.0 (our declared minimum); either keep compatibility with the version we declared or bumped the min version to v0.14.0
| from vllm.v1.attention.backends.registry import AttentionBackendEnum | ||
| from vllm.v1.attention.selector import AttentionSelectorConfig |
There was a problem hiding this comment.
vllm.v1.attention.* imports in vllm_metal/platform.py and tests/test_platform.py do not exist in vLLM 0.13.0; it only exists in 0.14.0?
| elif hasattr(cache, "cache") and not isinstance(cache, BatchKVCache): | ||
| # Fallback for older ArraysCache/MambaCache versions where .extract is missing | ||
| new_cache = type(cache)(len(cache.cache)) | ||
| new_cache.cache = [c[idx : idx + 1] for c in cache.cache] | ||
| extracted.append(new_cache) |
There was a problem hiding this comment.
hmm... can you confirm whether cache can include None entries in this path (hybrid/Mamba/ArraysCache)? The fallback list comprehension slices every entry, which would crash on None. If None is possible, we should preserve those entries in the fallback (or add a test showing it cant happen)
|
I see this getting addressed in #110 so it would be better to close this one. [PS: I took a lil break from coding due to fatigue] |
Addresses changes introduced in ml-explore/mlx-lm#842, which deprecated
MambaCacheforArraysCachein inference loop. RemovedBatchMambaCacheclass asArraysCachenow supportsmerge()andextract(). Older mlx-lm versions still support MambaCache.Tests
ruffandmypychecks passed.