[Bugfix] - Fix Mamba prefix caching corruption with chunked prefill by Josephasafg · Pull Request #34587 · vllm-project/vllm

Josephasafg · 2026-02-15T20:34:02Z

Purpose

When using Mamba models with mamba_cache_mode="all" and chunked prefill enabled, mamba state blocks can be cached before they contain complete state (e.g. if mamba_block_size=2048 and chunked prefill chunks the sequence to 1800 and 1100 tokens), leading to incorrect output when subsequent requests hit the prefix cache.

Root Cause

The SSM kernel (selective_scan_fwd) writes state at chunk boundaries that don't necessarily align with block boundaries. When the scheduler splits a long prefill into multiple chunks:

So if sequence is 3026

Chunk 1 (e.g., 1866 tokens): Kernel writes SSM state to block 0, covering positions [0, 1866)
Chunk 2 (e.g., 1160 tokens, total 3026): Kernel writes to block 1 only - block 0 is untouched
cache_blocks() sees 3026 // block_size full blocks -> caches block 0
Block 0's state only covers 1866 tokens, but its hash represents the full block (mamba_block_size - 2048 tokens) → corruption on cache hit

The kernel writes state based on:

Intermediate chunks: block_idx_first_scheduled + chunk_idx
Last chunk: block_idx_last_scheduled

This means earlier blocks may never be "completed" if subsequent scheduler calls skip over them.

Solution:

Track exactly when each block was written by the kernel via a block_write_positions dict. Only cache blocks where write_position == (block_idx + 1) * block_size exactly.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Josephasafg <ajgard7@gmail.com>

gemini-code-assist

Code Review

This pull request addresses a critical bug causing prefix caching corruption in Mamba models when chunked prefill is enabled. The root cause is that state blocks were being cached before they were completely filled, leading to incorrect state being loaded from the cache. The fix introduces a mechanism to track the write position for each block within a request. This is achieved by adding block_write_positions to MambaManager and two new helper methods, _update_block_write_positions and _get_num_cacheable_blocks. The cache_blocks method is updated to use this tracking information to only cache blocks that are verified to be complete. The logic appears sound and correctly resolves the described caching issue by preventing incomplete blocks from being added to the prefix cache.

heheda12345 · 2026-02-18T07:21:24Z

@tdoublep how did you handle mamba2? I thought all modes means kernel will writes 2048 and 3026 in the second chunk like #24683

Josephasafg · 2026-02-18T09:23:37Z

@heheda12345 @tdoublep I think I managed to implement a similar chunk alignment solution for mamba1, which handles chunk alignment from the kernel perspective, which will make this PR obsolete. I'll update here in a bit

Josephasafg · 2026-02-18T12:39:02Z

im closing this PR in favor of this #34798

Josephasafg added 2 commits February 15, 2026 15:51

Added tracking to blocks to be exactly mamba_block size

e0c3e4a

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Some renames

2bf27f5

Signed-off-by: Josephasafg <ajgard7@gmail.com>

mergify bot added v1 bug Something isn't working labels Feb 15, 2026

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

Josephasafg closed this Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] - Fix Mamba prefix caching corruption with chunked prefill#34587

[Bugfix] - Fix Mamba prefix caching corruption with chunked prefill#34587
Josephasafg wants to merge 2 commits intovllm-project:mainfrom
Josephasafg:fix_mamba_apc_block_cache_clean

Josephasafg commented Feb 15, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

heheda12345 commented Feb 18, 2026

Uh oh!

Josephasafg commented Feb 18, 2026 •

edited

Loading

Uh oh!

Josephasafg commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Josephasafg commented Feb 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

heheda12345 commented Feb 18, 2026

Uh oh!

Josephasafg commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Josephasafg commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Josephasafg commented Feb 15, 2026 •

edited by github-actions bot

Loading

Josephasafg commented Feb 18, 2026 •

edited

Loading