Skip to content

fix: reset DeltaNet recurrent state on prompt cache trim#14

Merged
raullenchai merged 3 commits intomainfrom
fix/deltanet-cache-reset
Mar 15, 2026
Merged

fix: reset DeltaNet recurrent state on prompt cache trim#14
raullenchai merged 3 commits intomainfrom
fix/deltanet-cache-reset

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

  • Fix stale DeltaNet recurrent state corrupting multi-turn conversations on Qwen3.5 models
  • Qwen3.5 uses hybrid architecture: 75% Gated DeltaNet (non-trimmable ArraysCache) + 25% full attention (trimmable KVCache)
  • Previous code skipped non-trimmable caches during trim, leaving stale state from prior tokens

Changes

  • Add _reset_all_caches() method to handle both KVCache (trim to zero) and ArraysCache (set entries to None)
  • Detect non-trimmable caches in _prepare_cache_for_prompt() and force full prompt recomputation when trimming is needed
  • Add tests/test_deltanet_cache.py with 8 unit tests covering: no overlap, partial overlap, exact repeat, pure KV regression, reset verification, growing conversation scenarios

Test plan

  • All 8 new DeltaNet cache tests pass
  • 965 existing tests pass (no regressions)
  • Manual test: multi-turn conversation with Qwen3.5 model

🤖 Generated with Claude Code

Your Name and others added 3 commits March 15, 2026 08:52
Qwen3.5 uses a hybrid architecture: 75% Gated DeltaNet layers
(ArraysCache, non-trimmable) + 25% full attention layers (KVCache,
trimmable). The prompt cache logic previously skipped non-trimmable
caches during trim, leaving stale recurrent state from prior requests.
This could corrupt multi-turn conversations where the DeltaNet hidden
state contained information from tokens no longer in the prompt.

- Add _reset_all_caches() to handle both KVCache and ArraysCache
- Detect non-trimmable caches and force full recomputation when needed
- Add 8 unit tests covering all cache reset scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ache reset

P1-a: Pure non-trimmable models (e.g. Mamba/DeltaNet with only ArraysCache
layers) never triggered cache reset — exact repeat reused dirty recurrent
state. Now detected via can_trim_prompt_cache() and cache is recreated from
scratch.

P1-b: CacheList wrapping non-trimmable sub-caches (e.g. CacheList(ArraysCache,
KVCache)) was skipped entirely because CacheList.is_trimmable() returns
all(...). Same fix — recreate when not trimmable.

Also fixed: CacheList has no .offset property — use .size() fallback for
trim amount calculation on wrapper types.

P2: Added 7 regression tests using real upstream cache types (ArraysCache,
KVCache, CacheList) instead of only MockCacheEntry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Moves top-level `from mlx_lm.models.cache import ...` behind
pytest.importorskip so mock-only tests still collect without mlx-lm.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raullenchai raullenchai merged commit dedec3a into main Mar 15, 2026
5 of 6 checks passed
@raullenchai raullenchai deleted the fix/deltanet-cache-reset branch March 15, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant