Skip to content

Move stale mamba state clearing into mixer forward methods#4

Closed
MatthewBonanni wants to merge 1 commit intoJosephasafg:mtp_edge_casefrom
MatthewBonanni:mtp_edge_case_mod
Closed

Move stale mamba state clearing into mixer forward methods#4
MatthewBonanni wants to merge 1 commit intoJosephasafg:mtp_edge_casefrom
MatthewBonanni:mtp_edge_case_mod

Conversation

@MatthewBonanni
Copy link
Copy Markdown

@MatthewBonanni MatthewBonanni commented Mar 10, 2026

Alternative approach. This leaves gpu_model_runner.py untouched and reduces overall LoC

Instead of clearing stale state in a separate pass from
gpu_model_runner (via clear_stale_mamba_states), zero the conv/ssm
state for new decode requests inline in each mixer's forward(),
right before the decode kernels read it. This mirrors how
has_initial_states_p is already handled for prefills and removes
Mamba-specific logic from the model runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants