Skip to content

[Bugfix] - Fix Mamba prefix caching corruption with chunked prefill#34587

Closed
Josephasafg wants to merge 2 commits intovllm-project:mainfrom
Josephasafg:fix_mamba_apc_block_cache_clean
Closed

[Bugfix] - Fix Mamba prefix caching corruption with chunked prefill#34587
Josephasafg wants to merge 2 commits intovllm-project:mainfrom
Josephasafg:fix_mamba_apc_block_cache_clean

Conversation

@Josephasafg
Copy link
Contributor

@Josephasafg Josephasafg commented Feb 15, 2026

Purpose

When using Mamba models with mamba_cache_mode="all" and chunked prefill enabled, mamba state blocks can be cached before they contain complete state (e.g. if mamba_block_size=2048 and chunked prefill chunks the sequence to 1800 and 1100 tokens), leading to incorrect output when subsequent requests hit the prefix cache.

Root Cause

The SSM kernel (selective_scan_fwd) writes state at chunk boundaries that don't necessarily align with block boundaries. When the scheduler splits a long prefill into multiple chunks:

So if sequence is 3026

  1. Chunk 1 (e.g., 1866 tokens): Kernel writes SSM state to block 0, covering positions [0, 1866)
  2. Chunk 2 (e.g., 1160 tokens, total 3026): Kernel writes to block 1 only - block 0 is untouched
  3. cache_blocks() sees 3026 // block_size full blocks -> caches block 0
  4. Block 0's state only covers 1866 tokens, but its hash represents the full block (mamba_block_size - 2048 tokens) → corruption on cache hit

The kernel writes state based on:

  • Intermediate chunks: block_idx_first_scheduled + chunk_idx
  • Last chunk: block_idx_last_scheduled

This means earlier blocks may never be "completed" if subsequent scheduler calls skip over them.

Solution:

Track exactly when each block was written by the kernel via a block_write_positions dict. Only cache blocks where write_position == (block_idx + 1) * block_size exactly.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Josephasafg <ajgard7@gmail.com>
@mergify mergify bot added v1 bug Something isn't working labels Feb 15, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug causing prefix caching corruption in Mamba models when chunked prefill is enabled. The root cause is that state blocks were being cached before they were completely filled, leading to incorrect state being loaded from the cache. The fix introduces a mechanism to track the write position for each block within a request. This is achieved by adding block_write_positions to MambaManager and two new helper methods, _update_block_write_positions and _get_num_cacheable_blocks. The cache_blocks method is updated to use this tracking information to only cache blocks that are verified to be complete. The logic appears sound and correctly resolves the described caching issue by preventing incomplete blocks from being added to the prefix cache.

@heheda12345
Copy link
Collaborator

@tdoublep how did you handle mamba2? I thought all modes means kernel will writes 2048 and 3026 in the second chunk like #24683

@Josephasafg
Copy link
Contributor Author

Josephasafg commented Feb 18, 2026

@heheda12345 @tdoublep I think I managed to implement a similar chunk alignment solution for mamba1, which handles chunk alignment from the kernel perspective, which will make this PR obsolete. I'll update here in a bit

@Josephasafg
Copy link
Contributor Author

im closing this PR in favor of this #34798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants