Skip to content

[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode#33706

Merged
vllm-bot merged 10 commits intovllm-project:mainfrom
peakcrosser7:ups/fix_block_aligned
Feb 13, 2026
Merged

[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode#33706
vllm-bot merged 10 commits intovllm-project:mainfrom
peakcrosser7:ups/fix_block_aligned

Conversation

@peakcrosser7
Copy link
Contributor

@peakcrosser7 peakcrosser7 commented Feb 3, 2026

Purpose

Currently, block-aligned splitting is only executed during the prefill phase of new requests (num_output_tokens == 0). It does not account for resumed requests. This leads to resumed requests being scheduled without block alignment, causing Mamba states to be stored in a non-aligned fashion. Consequently, incorrect states are retrieved during subsequent cache hits.

Changes in this PR:

  • Ensure that block-aligned splitting is performed during the prefill phase for both new and resumed requests.
  • Optimize the logic to split prompt tokens and output tokens separately on block-aligned boundaries for resumed requests. This ensures that the Mamba states at the end of the prompt is cached, improving the cache hit rate during request replays. Removed for code simplicity.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with block-aligned splitting in Mamba's cache align mode, which previously did not handle resumed requests correctly. The changes ensure that block-aligned splitting is performed for both new and resumed requests during the prefill phase. Additionally, the logic is optimized to handle prompt and output tokens separately for resumed requests, which should improve cache hit rates. The code has been refactored to introduce a helper function _mamba_compute_cache_pos for better clarity. The changes appear correct and well-implemented to solve the described problem.

This reverts commit c29650b.

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heheda12345 heheda12345 enabled auto-merge (squash) February 12, 2026 19:30
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026
@vllm-bot vllm-bot merged commit bf37812 into vllm-project:main Feb 13, 2026
42 of 45 checks passed
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
…gn mode (vllm-project#33706)

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
…gn mode (vllm-project#33706)

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
…gn mode (vllm-project#33706)

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants