-
Notifications
You must be signed in to change notification settings - Fork 19
Promise microtask scheduling can create starvation #25
Comments
That code snippet is quite illustrative, but doesn’t seem to represent a real-world use case where it would be “easy” to get starvation. Could you elaborate on specifically how this occurs in practice? |
CC @MayaLekova @bmeurer - this came in after the discussion and is worth looking into. |
cc @hashseed |
I'll come back later with some more code, but I just wanted to point out that the code above only used a setTimeout for illustration of a mix between asynchrony and not... ostensibly that specific code behaves that way because microtasks always cut in line in front of waiting ticks (like a timer). That's one kind of starvation. But there's another flavor of that starvation where it's just two different promise loops (no timers) and you still don't see a balanced a / b / a / b .. tradeoff between resolutions the way one might expect in a more fair round-robin scheduling. That idea seems contrived in JS code alone, but it's entirely common and practical when you're in the realm of CSP. For one, you usually make a "process" stay alive "forever" by doing a The thing is, design-wise, as you're building each CSP process in your program, you're designing them largely to run "independently", and only to collaborate from time to time when a message needs to be passed. That "independently" part only works if you can assume that each process gets fair/equitable chance for its messages to be sent/received by the rest of the system. In other words, whether you're conscious of it or not, you're expecting a sort of round-robin effect on the resolutions bouncing around the different processes if they happen to stack up at the same moment. The system is much weaker if there's any chance that one set of channel messaging between two processes in the system can end up starving out all the other independent processes. You really don't want that to happen, and you kind of rely on the system not to let it happen. One of the things about blocking messaging semantics (like that of CSP) is that you get automatic back-pressure. So it's common to design a producer of some message to just pump out its messages as quickly as possible (think of like a stock ticker), but it's throttled to only generate exactly as many messages as its consumer(s) take. So, the producer might have its message ready immediately on each turn of its The point I'm trying to make is, you sorta design intentionally a piece of your system that could push out an "infinite" and "immediate" stream of messages (aka, immediately resolved promises), but you're OK that this doesn't cripple the whole system because you know that this isn't exhausting all the scheduling and starving out other processes' messages from being exchanged. At least, that's what you hope. Here's a strawman of such (susceptible) code: var coords = channel();
var userClicks = channel();
async function generateCoordinates() {
while (true) {
let x = Math.trunc(Math.random() * 1000);
let y = Math.trunc(Math.random() * 1000);
await CSP.put( coords, { x, y } );
}
}
async function generateShape() {
while (true) {
await CSP.take( userClicks );
let { x, y } = await CSP.take( coords );
drawCircle( x, y );
}
} Here, the This is dangerous. And so CSP relies on a system that makes sure that such schedule starvation cannot occur. To this point, I maintain an async-flow-control library called asynquence, which has CSP support in it. I implemented CSP on generators initially, because I was able to make sure that each promise for each message goes into a round-robin queue, and that each "tick" internally (not JS ticks, but just units of work in the runner) takes one process off the queue, processes its message, then sticks it at the end of the line, so that it can't hijack the scheduling. This was a lot of work to get right, and its very delicate/intricate code. So I have a whole bunch of crazy edge test cases around it, so I don't later mess something up. Anyway, fast forward a few years, and |
cc @littledan @domenic Maybe this might be of interest to you as well. |
Would it be feasible to yield to the event loop occasionally within the implementation of the |
@littledan The What that sort of approach boils down to is the primary complaint I've made here, which is that promises weren't specified with some sort of built-in behavior to prevent this starvation, so user-land has to bend over backwards in hacks. |
I'm not sure what to do about it at this point. Now, Promises give you a sort of "atomicity" guarantee that they won't yield to the outer event loop until they are all done. I believe atomicity was an explicit design goal, for the web, which had to do with avoiding multiple renders of intermediate states. |
We obviously can't change the poor design decision now. The purpose of this "issue" is to request an extension hook (from node) of some sort, that would allow, for example, for code to be able to configure a trigger for a bailout of the microtask queue to prevent starvation. For example, this hook could fire when one microtask recurses to scheduling another microtask, and the hook could interrupt that default behavior and instead force a yield to the event loop, or could forcibly interleave other microtasks, or something like that. |
I'm sure it was on purpose, but that doesn't make it not a bad decision on the whole. Consider how the "design" of setTimeout is not for there to be any clamping, but browsers had to step in and create clamping to prevent run-away timer recursion. Or think about the arbitrary (external to the spec) limits put on call-stack depth to prevent running out of memory. Both are cases where the environment steps in and tempers otherwise unintended "design" side effects. I see this as a similar issue, one which I'm hoping that Node's exploration of a hook might end up informing future environment behavior, such as them automatically restricting microtask recursion depth to some limit. |
Would the ability to monkey-patch |
It was a very intentional design decision to have |
Can you elaborate on what you mean by that? |
If |
Thanks. Just to clarify - by "safe" you mean "would act in a way that developers find less surprising" and not "object capability theory safe" right? |
I think i mean both - i mean that one can confidently use it even when running untrusted code. |
I'm not asking for the ability to completely hijack/override these mechanisms as much as I am asking for the ability to defer scheduling of a microtask until the next tick so that another promise's scheduling isn't starved. So for example, a hook could be fired before scheduling a microtask, and if I think this is a much narrower ask and is far less unsafe than overriding |
Can't comment the safeness goal of I agree that starvation is a problem in the example you presented, but seems to me the predictability and "atomicity" of the microtask scheduling might outbid this problem. I would love to hear/read some further elaboration on the motivation of the current scheduling mechanism. |
I would hope that would only be the case if the hook is registered, and otherwise being un-present by default wouldn't have any effect on performance (except for maybe an if-statement). And if it was registered, it would be intentionally so, and for the greater purpose of ensuring balanced scheduling, even at the slight cost of the hook. I would imagine that this sort of thing would be in experimental/POC code designed to explore whether JS's promise scheduling algorithms should be adjusted. Such a hook would provide us the ability to explore the different use cases around scheduling/starvation, and compare/contrast the tradeoffs, rather than guessing at "outbids" from only the current perspective.
Every single "non-standard" hook being contemplated in this repository is in that same category. You can totally shoot yourself in the foot if you do bad things. Since this is Node only and not proposed for the greater ecosystem, we're drastically reducing the surface area of potential mistake. Also, these hooks would really only be of interest to someone in the research mindset described above. I'd wager 99.9999% of JS devs have never even contemplated scheduling starvation. However, if you zoom out to the broader world of concurrency management in programming language design, starvation is a very important first class topic. There's tons of writing (academic and practical) on the subject. I think it's to JS's detriment, and Node's, especially in light of efforts to move Node and JS toward multi-threading, that we would ignore such an important topic because it hasn't impacted much of the ecosystem yet. What I'm asking for is the ability to pursue research along those lines. I think broadly speaking, that's in line with the spirit of this repository's asks. |
If that was a concern with TC39 (and knowing @erights cares about this that's likely) - then I don't think we should allow code to opt other code into it. We could provide a MicrotaskQueue hook that would specifically only run That would violate user expectations regarding scheduling microticks vs. event loop yields. We even give a warning about this in the docs:
We should provide a similar warning when we write a guide on promises perhaps. (I've already been planning to work on one). It would also be useful to provide the same sort of warning when someone actually does this (resolves over a threshold of promises before events run). One can always subclass their promises and override |
This could be done if we broaden the scope of this issue, if we agree that would solve the starvation problem. |
I think this thread shows that we can't rely on monkey-patching |
@getify Agreed that current async/await semantics don't allow that. |
Hey all, just having to dig into this myself now. I have a question that I think is related. The use case that just prompted me to investigate this was trying to turn a synchronous, blocking async function() {
const ret = []
for (const item of items) {
ret.push(JSON.parse(item))
}
return ret
}
// Note that although its an `async function` the code is all synchronous. This one is clearly blocking for the entire async function() {
const ret = []
for (const item of items) {
await Promise.resolve()
ret.push(JSON.parse(item))
}
return ret
} As far as I understand, this makes the loop run asynchronously. But because of the starvation problem of the promise microtask queue being fully drained before moving onto the next phase of the event loop, I believe this doesn't actually unblock the event loop? Instead you'd have to do something like: async function() {
const ret = []
for (const item of items) {
await new Promise(resolve => setImmediate(resolve))
ret.push(JSON.parse(item))
}
return ret
} Which would then unblock the event loop? Is that correct? The reason I'm asking here instead of on StackOverflow is, if I understood all that correctly... It makes sense to me how promises have been designed to prioritize "atomicity"—like @littledan explained. I think what could be improved is—like @benjamingr mentioned—making the documentation reference this more clearly. Right now this is super not clear in Node's documentation, and even elsewhere in articles and answers. For what it's worth... I started the research journey by re-reading Don't Block the Event Loop carefully. But it didn't answer the question of whether using And then from there I turned to StackOverflow, but there is lots of misinformation there:
Lots of the answers and comments mention that promises by themselves ensure that the event loop is not blocked, which sounds like it's not true. Finally after feeling like the StackOverflow questions weren't quite explaining it enough for me, I read a whole bunch of different things without conclusive results. I even found this article by @rauschma that says...
...of which the first phrase makes sense, but the second seems incorrect? Until I found this series which explained that recursive promise resolution can indeed starve the event loop. Hope that all makes sense. Thanks! |
Here's an even simpler way of illustrating the problem: async function main() {
while (true) {
await Promise.resolve();
}
}
main();
setTimeout(function(){ console.log("nope"); },0); The |
@getify that's definitely a better way to prove out the behavior! For my example above, I think my issue is around how to write "partitioning" logic. For infinite loops it's easy to notice the issue (although still frustrating, I admit) because it hangs. But for finite loops I think many people right now are incorrectly assuming that they are not blocking the event loop. Which leads to request timeouts and other things that are hard to debug, especially when lots of the common sources for answers will give them false confidence, and they'll incorrectly dive into other potential blockers. |
Just taking a read through this, i have a few points i'd like to bring up which should help focus the conversation. As long as you're using What is actually happening here isn't even a problem with promises or microtasks, its a problem with how embedders interact with running the microtask queue. at a high level, a browser and/or node event loop looks kind of like this: while (true) {
runMicrotasks();
timers.forEach((timer) => {
if (timer.isReady()) { timer.cb(); }
});
IOqueue.forEach((io) => {
if (io.isLoaded()) { io.cb(); }
});
} As long as the microtask queue never empties, the embedder will never be able to run the timers and other various platform specific things. So i think a better question at this point is what can node/browsers do to make this better? One possible solution could be that embedders insert their own microtask items which would handle timers and whatnot, instead of having to run the queue to completion before doing so. (as a side note, there's also timer starvation, where recursive timers will also block the event loop, for the same reasons as above, so i'm not convinced this is entirely something that the promise team by itself needs to solve) |
Isn't that true for But for me, since I don't know enough to have an opinion on what the right call is, it's more of a documentation issue. Since it's really hard to find out what happens with chained promises as far as event loop blocking goes. |
@ianstormtaylor yes it's also true for promise.then, but it's more difficult to model these infinite loops with it. |
To be clear, starvation is not only a concern with "infinite loops" -- those are just a convenient illustration of the problem -- but even a blocking finite loop that just happens to prevent some other resource being able to process, where you would normally expect for them to interleave operations with each other, such as was shown in my OP. |
I'm not sure that do people expect timers to be interleaved with microtasks. they haven't ever been such in browsers or node. they even are referred to as macrotasks instead, because they're kind of conceptually different. |
It's not specifically a question about "timers", though that's a convenient illustration mechanism. I think the expectation people have is that "asynchronous" (like how promises resolve, and any other means of asynchrony, like timers or postMessage or ajax or anything else) always means "non-blocking", which most people seem to use as meaning "every async thing gets an equal chance to schedule its next step on the event loop, so everything gets processed eventually in due turn." Often that's the case, but it's not necessarily true of promises (or other microtask queueing mechanisms) because of the way the microtask queue is eagerly exhausted before yielding to any other part of the system. It's like microtasks get to cut in line and stay at the front of the line, and the rest of the event loop queue is just waiting indefinitely. What I would have imagined (and have alluded to at various points earlier in this thread) is that to prevent the microtask queue from starving the system, there could sort of be a bailout limit similar to the recursion-stack RangeError exception, or how timers have clamping to prevent run-away timer recursion. In both those cases, the environment steps in and overrides something to prevent undesirable behavior. For example, the microtask queue could spin for a certain maximum of N number of items, and if it exceeds that, then the environment spins a tick of the event loop before proceeding on the microtask queue again. This still ensures predictable ordering on the microtask queue, because it's still a queue, and any newly scheduled microtasks on that subsequent spin of the event loop would just get in line behind any that were leftover unprocessed. It's just basically a sanity check so that the system never spins unboundedly. This suggestion may not turn out to be a good solution (it may not even fix my CSP example), but it's the sort of thing that I wish we were exploring rather than just ignoring the problem entirely. |
cc @nodejs/v8 @nodejs/chakracore @nodejs/timers on the above |
Closing this issue because the repository is being archived: nodejs/admin#283 |
Not sure this is entirely relevant to bring up here, but I figured I'd throw it out for consideration nonetheless.
One major issue I have with promises is that the microtask queue is specified in a naive enough way that, at least in current observation of browsers, it's far to easy to create a starvation situation when we talk about scheduling between multiple promises.
The use-case where this bit me was in trying to use
async..await
functions to implement the ideas of CSP (communicating sequential processes), the model for concurrency used by default in Go and ClojureScript, notably.CSP has traditionally been modeled in JS through the use of generators (using the sync-async pattern of yielding promises, resuming on resolution, etc). When
async..await
came out, several CSP libraries moved to them from generators. Unfortunately, this exposes CSP code to this dangerous starvation problem of promise microtasks.The idea behind CSP is that you have two or more "processes" (think functions) that are running concurrently, and they are mostly independent but they "block" whenever trying to create a synchronous message to another process, or receive one from another. The blocking is done locally via a
yield
orawait
, which is on a promise that's used to represent the completed message. Whenever both reader and writer of a message are complete, the promises on both ends resolve, and the processes are unblocked.The problem here is that it's incredibly easy (either for malicious reasons or just by mistake) for one of these processes to get into an "infinite asynchronous" loop where they simply keep resolving a new promise, which adds a new microtask to the queue, and this ends up starving out any other promises that are waiting to fire a microtask to resolve.
Here's a code snippet that illustrates the starvation issue:
If you run that in say Chrome, you'll probably see something like:
As you can see in that sort of output, at first
a()
andb()
trade nicely, but as soon asb
gets into an immediate microtask loop, it starves outa()
entirely, preventing it from ever getting to do itsa: 2
message, until eventuallyb()
just quits after200
.Starvation is a particularly "deadly" problem for CSP, as several of its semantics make that situation much more likely than such contrived code as my example here. A CSP library needs to guard carefully against this.
If you use
async..await
, you can't really, since you have no control over the inter-promise resolution scheduling. If you use a generator-runner for the CSP, then you have the ability to mitigate the chances of the starvation using a simple round-robin (or other more sophisticated algorithm) for scheduling, making sure that you only give each "process" one chance to resolve each turn, and spreading the scheduling around so starvation doesn't occur.I believe this betrays a fundamental mis-design in promises, in that they didn't specify that the microtask queue has to prevent starvation of the sort I described. They should have ensured a minimum of round-robin scheduling, or even random scheduling, or weighted scheduling, or something... but not simply a naive queue.
I bring this issue up because I saw other threads talking about use-cases for hooking into the promise resolution machinery, and it reminded me of this problem. It's perhaps related in that if there was some "hook" provided, better starvation-free scheduling could be ensured if, for example, you know you'll be doing starvation-likely things like CSP.
The text was updated successfully, but these errors were encountered: