Conversation
Signed-off-by: Andrew Mason <andrew@planetscale.com>
…erics yet Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
2c8b071 to
1db42c0
Compare
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
eab3ae3 to
e92f40c
Compare
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
|
I think I've fixed the race-y tests (which kinda blew up the diff a bit more 😅), so this is ready for review! cc @vmg @doeg @notfelineit |
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
0f87cc0 to
1904198
Compare
|
I'll be 👁️ this today. Lots of code! |
Signed-off-by: Andrew Mason <andrew@planetscale.com>
vmg
left a comment
There was a problem hiding this comment.
Sorry for the delay on this! I took a look this morning. I don't fully understand the double-cache usage, so I left a suggestion. Also some changes to racy stuff with contexts. Cheers!
go/vt/vtadmin/cache/cache.go
Outdated
| // | ||
| type Cache[Key Keyer, Value any] struct { | ||
| cache *cache.Cache | ||
| fillcache *cache.Cache |
There was a problem hiding this comment.
I don't fully understand the behavior of fillcache. It seems to be keeping track of the last time we enqueued a fill, so it doesn't happen too often, but if that's the only purpose, it doesn't need to be a cache. It can simply be a map[Key]time.Time.
There was a problem hiding this comment.
(just make sure to delete from the map after each read)
There was a problem hiding this comment.
yep, you have the intended purpose exactly right! i was using the cache because it's threadsafe, if i take your suggestion i'll need to manually manage a mutex for it (which is fine, i think in hindsight i was just being a little lazy and "well, i've already got caches on the brain")
There was a problem hiding this comment.
(just make sure to
deletefrom the map after each read)
I actually don't think so, but double check my thinking:
Imagine your backfill queue was [k1, k1, k1] and your wait interval was 10 minutes.
On the first loop, you fill k1, and record time.Now().
On the second loop, you try to fill k1 again, but it's been less than 10 minutes, so you drop the request.
If you delete the key from fillcache, the third loop would then re-fill k1 even though it hasn't been 10 minutes.
So I think the map has to grow forever. The only reason to delete would be if, when you read, the fill time is outside the wait period, but then you're going to re-fill the cache which will just put the key right back in the map. The only "space" you would save is if your re-fills are failing (i.e. fillFunc returns err != nil, which should happen (ideally) rarely.
There was a problem hiding this comment.
Huh, wouldn't that also apply to the cache though?
There was a problem hiding this comment.
yeah in most practical cases, probably (but we also don't explicitly delete from the cache currently, we just expire things)
There was a problem hiding this comment.
well, okay, we never explicitly delete things, but if you specify a non-zero CleanupInterval, then go-cache runs a background thread that deletes things that are expired, so infrequently-fetched schemas will get purged over time
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
vmg
left a comment
There was a problem hiding this comment.
Looks good! You can reduce the locking in Get though.
| // Record the time we last cached this key, to check against | ||
| c.lastFill[key] = time.Now().UTC() | ||
| // Then cache the actual value. | ||
| return c.cache.Add(key, val, d) |
There was a problem hiding this comment.
I would keep c.cache.Add outside of the locking for c.m. It adds unnecessary contention.
| key := req.k.Key() | ||
|
|
||
| c.m.Lock() | ||
| if t, ok := c.lastFill[key]; ok { |
There was a problem hiding this comment.
I see you've decided not to clean-up lastFill, which probably makes sense for most growth patterns. I would comment this behavior explicitly.
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Signed-off-by: Andrew Mason <andrew@planetscale.com>
Description
This PR introduces a caching mechanism for vtadmin, with our first use-case being schemas.
It does not use the
go/cachepackage, primarily because I wanted a mechanism to enqueue backfills as a first-class concern, which I'll talk about a bit later.Heeeeeeeeeeere we go:
Cache Design
The cache design is pretty much what you would expect in generics: we store a mapping of keys (of type
K) to values (of typeV, which can beany-thing).You might expect the cache to be defined as
Cache[string, V any], but that requires every callsite to know how to turn something into a key.By pushing this into an interface (
Keyer) that keys have to implement, there can only be one place (theKey()method) where cache keys are created, which reduces duplication of code and therefore the chance two places accidentally use different key schemes for the same cache.Further, I chose not to leverage the existing
Stringerinterface, in case we ever need to turn a struct into a string for reasons other than "unique cache key".Anyway.
Our generic cache is built on top of go-cache and provides only a subset of the functionality.
We can expand this set if we end up needing more features, but for now
Add()andGet()are sufficient.The Backfill Queue
Or: "The Reason We Aren't Using
vitess.io/vitess/go/cache".The schema endpoints in vtadmin have a bunch of different options which control the size and shape of the data returned to the caller.
We do cross-shard size aggregations, can request subsets of tables in a keyspace (including fetching just a single table's
TableDefinition), include or exclude views, and so on.We could build a cache that incorporates all of these options into the key, and cache each permutation of possible request/response pairs for every keyspace in a cluster, but:
We would be storing a ton of duplicate data.
We would get really poor cache reuse.
a. Imagine requesting
/schemas/ks1/and then/schemas/ks1?tables=t1,t2— we would have to make two full round-trips fromvtadmin-apito the cluster to first fetch all the table definitions and then again to gett1andt2even though we literally just fetched them.So, what we do instead is cache only the largest response payload (in other words, the "give me all of everything" request equivalent) in the cache.
Then, when pulling data out of the cache, we look at the particular request's filtering options to whittle down the full payload into the subset that the caller actually wants.
However, this means that if the first request (i.e. "nothing cached") is not for the full payload, we won't be able to fill the cache before returning.
So instead, we make one blocking round-trip to the cluster to get the payload for that request, and then in the background instruct the cache to go get the full payload, via
EnqueueBackfill().Then, future requests will have the fully-cached payload, which they can extract subsets of depending on their specific options.
Cache Busting
Finally, we need a way to force a cache eviction/bypass/refresh/whatever, so we aren't stuck serving stale data waiting for responses to get evicted organically.
I wanted to solve this globally so that we would not need to update every single request proto message when we got around to adding caching for it.
So!
vtadmin-apinow takes a global flag called--cache-refresh-key(despite using it as the title of this section, I dislike the term "cache busting" and this was the best alternative I could come up with — suggestions welcome!).This gets incorporated as an HTTP header called
X-<the-flag-value>and a gRPC metadata key, each of which is used to propagate "please bypass and refresh the cache for this request" from the caller down to a particular cluster method.This is covered by the functions
ShouldRefreshFromIncomingContext(gRPC metadata case) andShouldRefreshFromRequest(HTTP case), as well as in our HTTP-to-gRPC adapter layer, which propagates from an HTTP header to gRPC metadata.Outstanding TODOs
The main thing missing from schema caching in particular is
FindSchemas.Because we need not only to know the schema definition(s) but which cluster(s) those definitions live in, we cannot cache solely at the cluster level.
We'll need to add a second cache at the API level to track which clusters have which schemas, and then look to just those clusters for cached schemas.
This PR is already a pretty significant chunk of work, so I am punting on that for now.
Related Issue(s)
Checklist
Deployment Notes