-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh cache before writing contents to bundle #9123
Conversation
7f74f36
to
206c6c3
Compare
// This only occurs if the reading thread has a transaction that was created before the writing thread committed, | ||
// and the transaction is still live when the reading thread attempts to get the written value. | ||
// See https://github.com/parcel-bundler/parcel/issues/9121 | ||
options.cache.refresh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a write of the sourcemap down lower (starting ~line 174 in your changed code) - while there are a few more async calls in between here and there, and so it's more likely that LMDB will have reset the transaction by then, should we move this refresh outside of this condition so that even if we're dealing with a large blob we refresh the cache anyway to avoid (potential) failures when (maybe) getting the sourcemap from the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While thinking through this a bit more, i realized that WriteBundleRequest
will always run in the main thread (it doesn't ever create a handle with the worker farm like the PackageRequest
does). I think this means that we actually can do something closer to the 'correct' approach by moving the reset to WriteBundlesRequest
right after we've received the result from the PackageRequest
.
I pushed an update that does this. The overview is now (starting at Parcel => run(buildRequest) => run(writeBundlesRequest)
):
- [main thread] for each bundle:
- if
useMainThread
isfalse
- [worker thread]
run(packageRequest)
- [main thread]
cache.refresh()
- [worker thread]
- else
- [main thread]
run(packageRequest)
- [main thread]
- [main thread]
run(writeBundleRequest)
- if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense.. I dd consider this but I had confused myself about what was running where.
In terms of testing this - while we have an artificial reproduction with the same shape (https://github.com/marcins/lmdb-testbed/blob/testable-version/index.js), I don't really think there's much benefit to adding that to Parcel since all that does is show that adding a call to The ideal reproduction would be using I guess TL;DR - I don't think there's an effective way to test this fix in Parcel - we can demonstrate that it fixes out internal (complex) repro case with visual regression tests, and I think that should be enough? |
b50035b
to
b2da4c9
Compare
..only if the packaging request didn't run in the main thread.
b2da4c9
to
7f06109
Compare
Would it make sense to call the method something along the lines of |
Naming is hard! We originally thought about calling it That's effectively what's happening - ensuring the current cache instance is "synced" with the persistent store. I don't think "flush" is an accurate representation, though the api is undocumented the assumption is any "put" is committed before resolving.
|
Maybe another way of thinking about it that might help to stimulate the right name to come to mind is to write what the API docs would look like for this method in the interface. I would expect it's something like:
Unfortunately, I think it's difficult to get away from the fact that this is a bit of a leaky abstraction of how LMDB works, rather than a design decision for the cache API. |
Oh, the reader has to call this, not the writer. Now I see why it's not "flushing" And do add that comment to the definition in the I think another perspective here is that when reading, lmdb behaves like a two layer cache (the top layer being determined by the read transaction). And the new method brings in changes from the bottom layer into the top layer. |
I added @marcins suggested documentation verbatim 😉 If i understand correctly, we've decided to stick with |
Fixes #9121
Background
As described in #9121 and comments, a race condition exists for our LMDB cache backend when used in a multi threaded way. The condition is:
The fix is to manually reset the read transaction in Thread A so that the next read sees a fresh snapshot of the db.
Note that this is not a bug in LMDB, and there is a documented API for dealing with this scenario.
Implementation
Because requests deal with a
Cache
abstraction rather than dealing directly with the cache backend, this PR opts to add aCache.refresh()
method to the interface. It is a no-op for all of the existing backends except forLMDBCache
, which simply callsresetReadTxn()
on the backing store.The
cache.refresh()
method is currently called right before running aWriteBundleRequest
(and only when the preceedingPackageRequest
was not run in the main thread), which is the only place we've seen this type of failure so far, but there may be other places where we should add it.🚨 Test instructions
Still trying to work out if we can add a test for this, as it is not easy to replicate in the wild.