Skip to content

server: enhance FIFO prompt cache eviction with second-chance algorithm#23666

Open
nonml wants to merge 2 commits into
ggml-org:masterfrom
nonml:clock-eviction
Open

server: enhance FIFO prompt cache eviction with second-chance algorithm#23666
nonml wants to merge 2 commits into
ggml-org:masterfrom
nonml:clock-eviction

Conversation

@nonml
Copy link
Copy Markdown

@nonml nonml commented May 25, 2026

  • Pure FIFO eviction always removes the oldest cache entry
  • A small side session request evicts a large session's cached KV state immediately, forcing full reprocessing and preventing users on small machines from switching sessions back and forth.

What the fix does:

  • Assign score = 1 to new cache entries
  • Only on a cache hit (best match) score += 1 (max score=4) -> 4 is the sweet spot
  • During checkpoint limit reached, instead of always evicting first out,
    • If front score <= 1, evict it
    • Otherwise, score -= 1 rotate it to the back (second chance), repeat
  • If all entries have max score, cap max iter at states.size() * 5 and falls back to evicting the front to prevents infinite loops

With this algorithm, I can switch between sessions without sacrificing compute.

Additional information

related issue (#23030, #20510)

Requirements

Switches to a second-chance policy with basic hit tracking and decay to prevent one-off requests from evicting heavily used system prompts. Uses a simple score-decay approach to track hits without adding O(N) scan overhead.
@nonml nonml requested a review from a team as a code owner May 25, 2026 13:18
@jacekpoplawski
Copy link
Copy Markdown
Contributor

I initially tried this #22826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants