You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we have a broken build where not all files were built
so the archive index is incomplete
but because there were requests to the crate, we have an active connection pool for this sqlite database
now we do a rebuild.
After the rebuild,
the archive index on S3 & locally is correct again,
but we still have the active connection to the old broken sqlite index.
Currenty we keep the connection for an hour after the last request , or until it gets expired when we have 500 (current max) other crates being accessed.
I think this was the reason in at least two cases ( bevy yesterday, and #2268 ) why I didn't see the fixed docs on docs.rs after the CDN purge was done.
potential solutions:
I'm not sure about this.
With everything in one process, we could just invalidate the archive path from the sqlite connection pool. With two builders, or in the future builders on separate servers & more web servers with their own caches this becomes tricky.
I could imagine:
not having a connection pool at all (with a performance impact, I remember single-digit milliseconds, multiplied by the number of accesses )
just keeping the connection for the duration of the request (low single-digit ms impact per request, I think we always only need one database)
having a max lifetime of connections of some value around the CDN invalidation time, so ~5 minutes.
I also wonder if I can find a simpler solution that doesn't involve managing 500 sqlite connection pools.
The text was updated successfully, but these errors were encountered:
Multiple servers makes it more complicated than just the pool, we need to know when to pull down a new index from S3 too. One thought is we could include the build id in the index cache path too, so if we see a new build id we know we need to grab a new index.
true, multiple web servers would lead to this issue. I agree including the build-id is the best approach for that. And could solve the connection pool issue too.
I just re-did some measurements with rusqlite, just doing a simple query
new connection every time: 80µs
re-using the connection: 8µs
So only looking at these numbers, it makes a ~10x difference.
Though seeing our normal response times I wonder If we could still drop the whole connection pool thing, replacing it with something simpler that is per request.
This is an issue that can happen with rebuilds.
Let's say :
now we do a rebuild.
After the rebuild,
Currenty we keep the connection for an hour after the last request , or until it gets expired when we have 500 (current max) other crates being accessed.
I think this was the reason in at least two cases ( bevy yesterday, and #2268 ) why I didn't see the fixed docs on docs.rs after the CDN purge was done.
potential solutions:
I'm not sure about this.
With everything in one process, we could just invalidate the archive path from the sqlite connection pool. With two builders, or in the future builders on separate servers & more web servers with their own caches this becomes tricky.
I could imagine:
I also wonder if I can find a simpler solution that doesn't involve managing 500 sqlite connection pools.
The text was updated successfully, but these errors were encountered: