[WIP][Model Runner V2] Enable mamba align for spec decode by TheEpicDolphin · Pull Request #41279 · vllm-project/vllm

TheEpicDolphin · 2026-04-29T19:36:27Z

No description provided.

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

…Specs in _update_hybrid_attention_layout Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Co-authored-by: OpenAI Codex

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

mergify · 2026-04-29T19:37:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @TheEpicDolphin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces support for hybrid Mamba-Attention models in the V1 engine, specifically enabling speculative decoding in 'align' mode through a new MambaHybridModelState and dual-indexing logic in Mamba kernels. It also updates KV cache management and refactors multimodal registry error handling to be more robust. Review feedback suggests replacing redundant assertions in the new model state implementation with explicit error handling or type guards.

gemini-code-assist · 2026-04-29T19:39:20Z

+            assert req_states is not None
+            assert scheduled_spec_decode_tokens is not None


The assertions assert req_states is not None and assert scheduled_spec_decode_tokens is not None are redundant because these arguments are already typed as RequestState | None and dict[str, list[int]] | None respectively, and the logic below relies on them being present. If they are None, the code will fail with an AttributeError anyway. It is better to handle the None case explicitly or use a type guard.

gemini-code-assist · 2026-04-29T19:39:20Z

+        assert self.last_kv_cache_config is not None
+        assert self.last_block_tables is not None


The assertions assert self.last_kv_cache_config is not None and assert self.last_block_tables is not None are redundant. If these are None, the code will fail with an AttributeError. It is better to handle the None case explicitly or ensure the state is initialized correctly.

izhuhaoran and others added 17 commits April 6, 2026 22:07

support qwen35 / mamba hybrid model for model runner v2

5aeb521

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

use MambaHybridModelState to refactor code

bf7b885

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

fix get_kv_cache_shape and dispatch attn build args for diff backend

976dab9

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

fix mrv2 qwen35 mtp multimodal registry

0ee4ea8

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

remove unused future annotations import

e852690

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Merge branch 'main' into MRV2-support-qwen35

d6515db

Merge branch 'main' into MRV2-support-qwen35

3639001

apply suggestions from @MengqingCao

9fd6e04

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

fix num_accepted_tokens in chunked prefill and fix UniformTypeKVCache…

e1d3df5

…Specs in _update_hybrid_attention_layout Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

feat: add is_prefill for MambaHybridModelState

778aa57

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Merge remote-tracking branch 'upstream/main' into MRV2-support-qwen35

c77c550

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Merge branch 'main' into MRV2-support-qwen35

719a39d

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Merge branch 'main' into MRV2-support-qwen35

5460418

refactor: assert piecewise captures have no attention metadata

2713f7d

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

Merge branch 'main' into MRV2-support-qwen35

612c05c

Merge main into MRV2-support-qwen35

aab0f12

Co-authored-by: OpenAI Codex

[Model Runner V2] enable spec decode + align mamba cache mode

895b7a8

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

mergify Bot added multi-modality Related to multi-modality (#4194) nvidia v1 labels Apr 29, 2026

github-project-automation Bot added this to NVIDIA Apr 29, 2026

mergify Bot added the needs-rebase label Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

TheEpicDolphin closed this May 12, 2026

github-project-automation Bot moved this to Done in NVIDIA May 12, 2026

TheEpicDolphin deleted the mrv2-spec-decode-enable-mamba-align branch May 12, 2026 22:18

TheEpicDolphin mentioned this pull request May 16, 2026

[Model Runner V2] support mamba hybrid models align prefix cache #42406

Open

njhill added the v2 label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][Model Runner V2] Enable mamba align for spec decode#41279

[WIP][Model Runner V2] Enable mamba align for spec decode#41279
TheEpicDolphin wants to merge 17 commits into
vllm-project:mainfrom
TheEpicDolphin:mrv2-spec-decode-enable-mamba-align

TheEpicDolphin commented Apr 29, 2026

Uh oh!

mergify Bot commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert req_states is not None
		assert scheduled_spec_decode_tokens is not None

		assert self.last_kv_cache_config is not None
		assert self.last_block_tables is not None

Uh oh!

Conversation

TheEpicDolphin commented Apr 29, 2026

Uh oh!

mergify Bot commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants