Skip to content

[codex] Fix recurrent cache resize before checkpoint restore#50

Draft
Nowayz wants to merge 1 commit into
spiritbuun:masterfrom
Nowayz:codex/fix-recurrent-checkpoint-restore
Draft

[codex] Fix recurrent cache resize before checkpoint restore#50
Nowayz wants to merge 1 commit into
spiritbuun:masterfrom
Nowayz:codex/fix-recurrent-checkpoint-restore

Conversation

@Nowayz
Copy link
Copy Markdown

@Nowayz Nowayz commented May 9, 2026

Summary

Recurrent backup cells are expanded for speculative decoding, but the server does not shrink them back before later prompt-cache/checkpoint restore. That means checkpoint restore is happening against a different recurrent-cache topology than the one used when the checkpoint was made.

Fixes #49.

Code Changes

  • Add a server prefill shrink step before prompt-cache save/load so recurrent memory returns to the non-speculative topology once draft backup cells are no longer active.
  • Remove speculative backup sequences before shrinking recurrent memory.
  • Skip shrinking while a slot is processing or still owns a draft backup.
  • Sanitize recurrent cell metadata after resize by dropping invalid sequence ids, clearing stale source rows, rebuilding tail pointers, and clamping cached range metadata.
  • Guard recurrent source-row lookup so invalid metadata falls back to the zero state instead of reaching backend GET_ROWS with an invalid row.

Validation

  • Built llama-server in a Windows CUDA Release build.
  • Checked the branch diff to confirm only the recurrent-cache fix files are included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context checkpoint restore can use expanded recurrent cache topology after speculative decode

1 participant