Move stale mamba state clearing into mixer forward methods#4
Move stale mamba state clearing into mixer forward methods#4MatthewBonanni wants to merge 1 commit intoJosephasafg:mtp_edge_casefrom
Conversation
Instead of clearing stale state in a separate pass from gpu_model_runner (via clear_stale_mamba_states), zero the conv/ssm state for new decode requests inline in each mixer's forward(), right before the decode kernels read it. This mirrors how has_initial_states_p is already handled for prefills and removes Mamba-specific logic from the model runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Alternative approach. This leaves
gpu_model_runner.pyuntouched and reduces overall LoC