-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explicit flush to Durable Object storage interface #87
Add explicit flush to Durable Object storage interface #87
Conversation
@a-robinson @jqmmes, please take a look! |
LGTM! |
Yep! I've used our internal Cloudflare test runner to verify the following:
|
0d5e735
to
c0ed164
Compare
c0ed164
to
a70be87
Compare
src/workerd/io/actor-cache.c++
Outdated
|
||
if (flushesEnqueued > 0) { | ||
// Some previous run had done operations that needed to be flushed and it is still ongoing. This | ||
// implies that `allowConcurrency` was set to true for those operations since the input lock did |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really true, is it? Puts having been buffered for flushing does not take the input lock. allowConcurrency
only affects writes if backpressure kicks in, and even there the effect is pretty minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, it's actually more about outstanding input gate-required operations being sequenced with the writes. I'll rewrite the comment to explicitly call out that the input gate is the important part.
FWIW some testing of this in actor-cache-test.c++ using its capnp mock framework would have covered the basic logic a bit more simply. And might still be worth adding to keep up the great test coverage of actor cache? But I get that there's pretty good internal coverage and that you also wanted to test behavior in a real isolate from a worker's perspective. |
Hey, sorry to bikeshed, but: This serves a very similar purpose to |
I would definitely prefer that any new functionality in |
Yeah, I'd been a little hesitant to go with |
Glad this was brought up! I agree that |
Yeah, my primary focus was testing the DurableObjectState interface since the actor cache change was so simple. The good news is that I think it's a pretty simple unit test to write because of that, happy to do so! |
a70be87
to
33e6054
Compare
@@ -208,8 +208,8 @@ void ActorCache::evictOrOomIfNeeded(Lock& lock) { | |||
|
|||
// We want to break the OutputGate. We can't quite just do `gate.lockWhile(exception)` because | |||
// that returns a promise which we'd then have to put somewhere so that we don't immediately | |||
// cancel it. Instead, we can ensure that a flush has been scheduled. `flush()`, when called, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is all this renaming still useful if the public method is now called sync()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, only ever so mildly. We had some functions named flushImpl*()
and some named flush*()
and it would be nice to unify them. I can understand if you'd like me to drop that commit now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, it's fine.
src/workerd/io/actor-cache.c++
Outdated
} | ||
|
||
if (flushScheduled) { | ||
// Our current run has done operations that need to be flushed, we need to wait for that flush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't think this comment is strictly accurate either -- it isn't necessarily our current run that set flushScheduled
to true. Some other request could have done that, and the field just hasn't been set back to false yet because there was some previous flush that was still in progress when the other request scheduled an additional flush. In that case the previous flush would be keeping lastFlush
from resolving and running the callback that would set flushScheduled
to false for the new flush.
.
Given how hard it is to explain, I'm not sure this level of detail in the comments to distinguish between flushScheduled
and flushEnqueued
is all that useful when someone can look up above at how they're used. I'm not even totally sure how to explain it at this point, but I think my best attempt would be to rename flushesEnqueued
to flushInProgress
or something like that?
But you can keep these comments around if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I get what you're saying. Yeah, I'll trim the comment down to the bare minimum to explain the difference between the two conditions.
@@ -4856,5 +4857,210 @@ KJ_TEST("ActorCache alarm delete when flush fails") { | |||
} | |||
} | |||
|
|||
KJ_TEST("ActorCache can wait for flush") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this doesn't cover the case where one operations schedules a flush, then the flush starts, and then another operation schedules a second flush while the first flush is still in progress. It'd be worth confirming that:
- an
onNoPendingFlush()
from before the second operation resolves when the first flush is resolved - an
onNoPendingFlush()
from after the second operation schedules a flush but before the first flush completes does not resolve until after the second flush completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me as an additional thing to test, thanks! The set of combinations for this is starting to feel a bit nebulous, so I'll just use a put as the second flushable.
We're adding a method to the JSG interface called `flush()`, let's make it clear that these functions are not the same thing.
This commit provides a way for a worker to wait for its writes (including those performed with allowUnconfirmed) to be synchronized with storage. Note that if there is no flush currently pending, the JSG method will provide a resolved promise and the current task will continue through its microtask queue instead of yielding to other tasks.
33e6054
to
7fa31ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test coverage!
@@ -2781,6 +2812,9 @@ kj::Promise<void> ActorCache::flushImplDeleteAll(uint retryCount) { | |||
|
|||
// Now we must flush any writes that happened after the deleteAll(). (If there are none, this | |||
// will complete quickly.) | |||
// TODO(soon) This will use the write options for the deleteAll() even if the options for future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, what makes this flushImplDeleteAll()
case different from any other case where allowUnconfirmed
is set on one write, but not the next, and then they're flushed together? I thought that was all handled in ensureFlushScheduled()
and I'm not seeing what would make this case different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big difference is that this flush recursion is in the same promise chain as the original output gate wrapper. Imagine you have two sets of operations in different js tasks, normally they get separate flush promises with their own output gate locks. With this follow on flushImpl()
, we still have two separate promises/flushes but the we handle the work for the second one in the first one. The second flush noops instead. I think this also happens when we have to retry failures?
I think it mostly matters when the deleteAll isn't allowUnconfirmed but the following operations are. In that case, we'll hold the output lock longer than the user requested. The reverse doesn't really matter because we're still holding the output lock as long as promised (assuming the user actually understood what was going on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, yeah. To clarify for my own sake -- it's already the case that if you do a put
or normal delete
without allowUnconfirmed
followed by another one with allowUnconfirmed
before the flush has started that the output gate will apply to both. The difference is just that those operations are flushed atomically with each other whereas a deleteAll happens strictly before any follow-up operations.
So I think you're right that this is pessimistic but it seems like a pretty minor case given how infrequently used deleteAll
is and that it requires mixing in allowUnconfirmed
on only some operations.
This PR provides a way for a worker to wait for its writes (including those performed with allowUnconfirmed) to be flushed to storage. Note that if there is no flush currently pending, the JSG method will provide a resolved promise and the current task will continue through its microtask queue instead of yielding to other tasks.