[WIP][Model Runner V2] Enable mamba align for spec decode#41279
[WIP][Model Runner V2] Enable mamba align for spec decode#41279TheEpicDolphin wants to merge 17 commits into
Conversation
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
…Specs in _update_hybrid_attention_layout Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: OpenAI Codex
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request introduces support for hybrid Mamba-Attention models in the V1 engine, specifically enabling speculative decoding in 'align' mode through a new MambaHybridModelState and dual-indexing logic in Mamba kernels. It also updates KV cache management and refactors multimodal registry error handling to be more robust. Review feedback suggests replacing redundant assertions in the new model state implementation with explicit error handling or type guards.
| assert req_states is not None | ||
| assert scheduled_spec_decode_tokens is not None |
There was a problem hiding this comment.
The assertions assert req_states is not None and assert scheduled_spec_decode_tokens is not None are redundant because these arguments are already typed as RequestState | None and dict[str, list[int]] | None respectively, and the logic below relies on them being present. If they are None, the code will fail with an AttributeError anyway. It is better to handle the None case explicitly or use a type guard.
| assert self.last_kv_cache_config is not None | ||
| assert self.last_block_tables is not None |
There was a problem hiding this comment.
No description provided.