[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode by peakcrosser7 · Pull Request #33706 · vllm-project/vllm

peakcrosser7 · 2026-02-03T16:00:43Z

Purpose

Currently, block-aligned splitting is only executed during the prefill phase of new requests (num_output_tokens == 0). It does not account for resumed requests. This leads to resumed requests being scheduled without block alignment, causing Mamba states to be stored in a non-aligned fashion. Consequently, incorrect states are retrieved during subsequent cache hits.

Changes in this PR:

Ensure that block-aligned splitting is performed during the prefill phase for both new and resumed requests.
Optimize the logic to split prompt tokens and output tokens separately on block-aligned boundaries for resumed requests. This ensures that the Mamba states at the end of the prompt is cached, improving the cache hit rate during request replays. Removed for code simplicity.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

gemini-code-assist

Code Review

This pull request addresses an issue with block-aligned splitting in Mamba's cache align mode, which previously did not handle resumed requests correctly. The changes ensure that block-aligned splitting is performed for both new and resumed requests during the prefill phase. Additionally, the logic is optimized to handle prompt and output tokens separately for resumed requests, which should improve cache hit rates. The code has been refactored to introduce a helper function _mamba_compute_cache_pos for better clarity. The changes appear correct and well-implemented to solve the described problem.

vllm/v1/core/sched/scheduler.py

This reverts commit c29650b. Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

heheda12345

LGTM!

…gn mode (vllm-project#33706) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

…gn mode (vllm-project#33706) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

peakcrosser7 added 7 commits February 1, 2026 16:40

block-aligned for resumed requests

4ff8cf6

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

block-aligned for output tokens of resumed requests

38b7a06

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

fix comments

2dfd6bc

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

optimize block-aligned split for resumed requests

c29650b

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

fix the comment

fd7f2b5

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

fix block-aligned logic

3e2d6cc

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

Merge branch 'main' into ups/fix_block_aligned

b13f419

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

peakcrosser7 requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners February 3, 2026 16:00

mergify bot added the v1 label Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

peakcrosser7 mentioned this pull request Feb 3, 2026

[V1][Hybrid] Enable spec decode and optimize block-aligned split in mamba cache align mode #33024

Closed

5 tasks

heheda12345 reviewed Feb 4, 2026

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

peakcrosser7 added 3 commits February 5, 2026 12:49

Merge branch 'main' into ups/fix_block_aligned

805b161

Revert "optimize block-aligned split for resumed requests"

408a531

This reverts commit c29650b. Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

add comment

b8ddae4

Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

peakcrosser7 requested a review from heheda12345 February 6, 2026 08:16

heheda12345 approved these changes Feb 12, 2026

View reviewed changes

heheda12345 enabled auto-merge (squash) February 12, 2026 19:30

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026

vllm-bot merged commit bf37812 into vllm-project:main Feb 13, 2026
42 of 45 checks passed

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Hybrid] Fix and optimize block-aligned splitting in mamba cache ali…

702c434

…gn mode (vllm-project#33706) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Hybrid] Fix and optimize block-aligned splitting in mamba cache ali…

616526a

…gn mode (vllm-project#33706) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode#33706

[Hybrid] Fix and optimize block-aligned splitting in mamba cache align mode#33706
vllm-bot merged 10 commits intovllm-project:mainfrom
peakcrosser7:ups/fix_block_aligned

peakcrosser7 commented Feb 3, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

peakcrosser7 commented Feb 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peakcrosser7 commented Feb 3, 2026 •

edited by github-actions bot

Loading