[codex] Fix recurrent cache resize before checkpoint restore by Nowayz · Pull Request #50 · spiritbuun/buun-llama-cpp

Nowayz · 2026-05-09T19:17:17Z

Summary

Recurrent backup cells are expanded for speculative decoding, but the server does not shrink them back before later prompt-cache/checkpoint restore. That means checkpoint restore is happening against a different recurrent-cache topology than the one used when the checkpoint was made.

Fixes #49.

Code Changes

Add a server prefill shrink step before prompt-cache save/load so recurrent memory returns to the non-speculative topology once draft backup cells are no longer active.
Remove speculative backup sequences before shrinking recurrent memory.
Skip shrinking while a slot is processing or still owns a draft backup.
Sanitize recurrent cell metadata after resize by dropping invalid sequence ids, clearing stale source rows, rebuilding tail pointers, and clamping cached range metadata.
Guard recurrent source-row lookup so invalid metadata falls back to the zero state instead of reaching backend GET_ROWS with an invalid row.

Validation

Built llama-server in a Windows CUDA Release build.
Checked the branch diff to confirm only the recurrent-cache fix files are included.

server: shrink recurrent cache before checkpoint restore

0df9784

github-actions Bot added examples server labels May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Fix recurrent cache resize before checkpoint restore#50

[codex] Fix recurrent cache resize before checkpoint restore#50
Nowayz wants to merge 1 commit into
spiritbuun:masterfrom
Nowayz:codex/fix-recurrent-checkpoint-restore

Nowayz commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Nowayz commented May 9, 2026

Summary

Code Changes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant