Skip to content

[WIP][Model Runner V2] Enable mamba align for spec decode#41279

Closed
TheEpicDolphin wants to merge 17 commits into
vllm-project:mainfrom
TheEpicDolphin:mrv2-spec-decode-enable-mamba-align
Closed

[WIP][Model Runner V2] Enable mamba align for spec decode#41279
TheEpicDolphin wants to merge 17 commits into
vllm-project:mainfrom
TheEpicDolphin:mrv2-spec-decode-enable-mamba-align

Conversation

@TheEpicDolphin

Copy link
Copy Markdown
Collaborator

No description provided.

izhuhaoran and others added 17 commits April 6, 2026 22:07
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
…Specs in _update_hybrid_attention_layout

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: OpenAI Codex
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
@mergify mergify Bot added multi-modality Related to multi-modality (#4194) nvidia v1 labels Apr 29, 2026
@mergify

mergify Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @TheEpicDolphin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 29, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for hybrid Mamba-Attention models in the V1 engine, specifically enabling speculative decoding in 'align' mode through a new MambaHybridModelState and dual-indexing logic in Mamba kernels. It also updates KV cache management and refactors multimodal registry error handling to be more robust. Review feedback suggests replacing redundant assertions in the new model state implementation with explicit error handling or type guards.

Comment on lines +72 to +73
assert req_states is not None
assert scheduled_spec_decode_tokens is not None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assertions assert req_states is not None and assert scheduled_spec_decode_tokens is not None are redundant because these arguments are already typed as RequestState | None and dict[str, list[int]] | None respectively, and the logic below relies on them being present. If they are None, the code will fail with an AttributeError anyway. It is better to handle the None case explicitly or use a type guard.

Comment on lines +154 to +155
assert self.last_kv_cache_config is not None
assert self.last_block_tables is not None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assertions assert self.last_kv_cache_config is not None and assert self.last_block_tables is not None are redundant. If these are None, the code will fail with an AttributeError. It is better to handle the None case explicitly or ensure the state is initialized correctly.

@github-project-automation github-project-automation Bot moved this to Done in NVIDIA May 12, 2026
@TheEpicDolphin TheEpicDolphin deleted the mrv2-spec-decode-enable-mamba-align branch May 12, 2026 22:18
@njhill njhill added the v2 label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants