Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the query cache a proper LRU with its own memory budget #7371

Open
teh-cmc opened this issue Sep 8, 2024 · 1 comment
Open

Make the query cache a proper LRU with its own memory budget #7371

teh-cmc opened this issue Sep 8, 2024 · 1 comment
Assignees
Labels
🚀 performance Optimization, memory use, etc project-many-entities-perf 🔍 re_query affects re_query itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Sep 8, 2024

(Definitely not the first time this comes up, but apparently this is first issue for it.)

#7370 introduces a low-effort, low-risk safety net to prevent a degenerate situation (namely: the memory limit was exceeded even though there is nothing left to be GC'd) from getting out of hand.
I believe we need that safety net no matter what: the ability to log non-GC-able static data makes it impossible to fully prevent these kinds of degenerate cases.

Still, it is quite bad that the query cache can be hogging all the available memory budget in the first place, leading to almost all the valuable temporal user data getting GC'd in favor of old time cursors that are probably irrelevant at that point.
The query cache should have its own memory budget (probably a configurable %age of the overall memory budget), and should recycle existing cache entries when that budget is reached, like any other LRU would.

We should also investigate the possibility of not caching static data in the query-time latest-at cache: queries that return static data are much cheaper to run, and polluting the query-time cache just to point to the static tables is very wasteful.

@teh-cmc teh-cmc added 🔍 re_query affects re_query itself 🚀 performance Optimization, memory use, etc labels Sep 8, 2024
@teh-cmc teh-cmc self-assigned this Nov 26, 2024
@teh-cmc
Copy link
Member Author

teh-cmc commented Nov 26, 2024

Note: cache eviction based on garbage collection events is a relic from the past, it needs to go away once we introduce LRUs.

Removing cache entries because the underlying data was GC'd is counter-productive in two ways:

  • Lots of compute spent reacting to removed Chunks even though it doesn't matter -- that's what the LRU is for, when we need space, it'll give us space.
  • Pre-emptively removing cached data that might still have value sucks, especially if there is nothing yet to replace it with -- again, just let the LRU do its job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 performance Optimization, memory use, etc project-many-entities-perf 🔍 re_query affects re_query itself
Projects
None yet
Development

No branches or pull requests

1 participant