-
Notifications
You must be signed in to change notification settings - Fork 46
Conversation
ac05481
to
bf31f31
Compare
I've gone ahead and polished up the implementation quite a bit... the spec had a few gaps caught between refactoring. Some implementation notes to describe the approach further:
The above is the simplest approach for this logic flattening to get better deterministic guarantees around sync completions in the semantics that I could find after experimenting with various different approaches. Look forward to review in due course, and happy to answer any questions further. |
Thanks for this detailed refactoring. Seems like there are a lot of good improvements in this version. I really appreciate how the volume of spec text keeps decreasing. I like the name changes as well. I am still wondering if it's possible to eliminate the [[Async]] internal slot, and to let the "evaluating-async" state just be "evaluated", as in the original #51, but it will take me some more time to think all this through. |
I managed to add a further refactoring here to cut down on the number of fields and also to ensure evaluation errors always propagate synchronously through all subgraphs, effectively reducing the "memory footprint" of the algorithm. I haven't gone into much detail yet on how better guarantees are provided around cycles with this PR, and with the current approach, cycle handling is pretty much the main complexity. So I wanted to just provide an example to hopefully clarify some of this logic. Consider the following graph (excuse the terrible image): Where A is the top-level module being imported, and the numbers next to each of the modules are the DFSIndex and DFSAncestorIndex. In this example, all modules except for E are in the same strongly connected cycle unit (DFSAncestorIndex 0). E is a dependency of this cycle. Assume every module is async for the purposes of this example since this example is about async cycles not the sync behaviour interactions which are the simpler ones :P The execution order of this graph is D,B,E,C,A, where B, C and A will only execute after their depenencies have completed. Synchronous BehaviourNow usually if this were a sync graph and there was an execution error on any of the modules, say on D, we would have "stack" containing A,B,D, all three of which would immediately transition to an error state. C and E wouldn't actually be in the stack, and would thus remain in an "instantiating" state (since they haven't been visited yet), where instead they would throw if you tried to import them later on, since their dependencies have cached errors. Asynchronous BehaviourSo what is the analog in the async cycle case? Instead of relying on "stack", we populate AsyncParentModules for any async module in the graph that isn't executed immediately in InnerModuleEvaluation. So B points to A. D points to both B and C. A has no AsyncParentModules as it is the cycle root for this top-level evaluation. So if there is an error when asynchronously executing D, AsyncEvaluationRejected will propagate the error up through the AsyncParentModules, which is unique to this cycle. So we call AsyncEvaluationRejected on the modules in this order: D, B, A, C, immediately transitioning all those modules together as a strongly connected component to the failure state. This is very similar to the stack approach although slightly different in that we also transitioned C (although note that an error on B would not have transitioned C or D). Top-level cycle dependenciesWhere cycles get tricky is when we throw other top-level evaluation jobs into the mix. If we had another module F dynamically imported after A, which imported say D, while the cycle modules are still asynchronously evaluating, what we don't want is for F to start executing on completion of D, even though the whole cycle has not finished yet. We want to ensure that F only executes after the entire cycle has finished executing. The way we do this is by treating the first execution of the cycle as taking "ownership" of the cycle (and effectively owning the DFSIndex and DFSAncestorIndex values - these are never be changed by any future importers). Then when importing from a cycle that has already been started (so not "evaluating" or "instantiated"), we instead look to the root of the cycle for checking for evaluation errors and adding evaluation listeners. So we try import D, which is currently in an "evaluating-async" state. We then walk up the AsyncParentModules of D until the DFSIndex matches the DFSAncestorIndex representing the root of the cycle at A using the GetCycleRoot abstract method. We then add F to the list of AsyncParentModules list of A that should get completion handlers applied on resolution or rejection of the cycle. If D completes successfully, but then E gets evaluated causing an error, that error is propagated up to A, and then propagates up to F, all in a well-defined way avoiding any partial states of the graph. Same-graph cycle dependenciesIf instead of being deferred, F was another module in the same top-level evaluation job, say M were the top-level evaluation, where M imported both A and F, and F still imports D from within the cycle, then we have two cases - either F is imported first, in which case it takes ownership of the cycle and the cycle is no longer owned by A (so the graph gets reorganized above), or F is imported second, in which case the cycle is already in the "evaluating-async" state when F is traversed. In this case we treat F exactly the same as in the previous section - having it depend on the root of the cycle's completion. ie calling GetCycleRoot(D) to get to A, and then adding F to A's AsyncParentModules list for notification on resolution. So this kinda closes the loop on proving strongly connected components always transition together. Hopefully that explains how these fields handle the complex edge cases quite elegantly. I've included lots of assertions around cycles in the algorithm to make these invariants clear. |
Here's an interesting open question about cycles and error handling. Let's consider the a graph where A depends on B and C, B depends on A, C depends on D, and D depends on A. So the execution order is B, D, C, A, where B and D execute immediately, C executes after D completes, and A executes after both B and C have completed. Now if there was an asynchronous rejection for B, while D was still in progress, should the completion of D's execution still go ahead and execute C, or should D's execution completion check and see that the cycle has errored and stop execution entirely? I get the feeling we should probably do the latter? The spec in this PR doesn't currently include that but we can add that check in roughly one or two lines. |
I really like the simplifications in this patch version. Great job thinking through all the cases in https://github.com/tc39/proposal-top-level-await/pull/74/files#issuecomment-491547656 . After thinking it through a bit, I agree with your judgements there (especially the cycle root technique, which is tricky). @zenparsing previously asked about reentrancy for this algorithm, and I think you've done a good job there. There are a couple issues with the way this is written up which I think could use some edits before landing:
I'll be trying to fix these up. |
At a high level, I do like the idea of this change. It's taken me some time to understand why we wouldn't want to use Promises for the reactions, but I do see the logic in the ordering that you start with here. This kind of change seems completely consistent with the possibility of host environments inserting more pauses/yielding to the event loop/microtask checkpoints, which they could do by wrapping Source Text Module Records in something which adds that logic. And some of that logic could also be added from JavaScript by wrapping one module in another which uses top-level await! So I'm getting convinced that doing these reactions synchronously is the most reasonable, least opinionated sort of way to do this, and forms a good basis for opinionated constructs on top. |
There's an editorial choice we could make in organizing this specification: Should we build the dependency graph up front (and have all evaluation be based on it), or should we built it on-line, just for the edges that are necessary due to async-ness? This patch goes with the former choice, but I was initially thinking of the latter. I'm not sure which one is conceptually simpler, or easier to understand for various people reading it. |
We do do this in this spec, and that is done by checking the "evaluating-async" module status. When the sync module depends on the async module, its state is put into "evaluating-async". Similarly if any module depends on any evaluating async module, it attaches to the completion in the graph. Note that the "evaluating-async" state is important exactly because this attachment only happens while the module is still executing. Once it is executed there is no attachment to the completion event, and "evaluating-async" moving to "evaluated" gives us this information.
The capability approach seemed the simplest and lowest-diff way to achieve this, but we could certainly look at other approaches. The capability approach was exactly used in the early 2014 draft of module algorithm briefly in ECMA262 by Jason Orendorff that specified the loader, and I remember finding it quite a useful pattern back then, but if you think there's a better way certainly let's do that. Agreed it could be worthwhile to update this approach to ensure it works for other module types by lowering some of the loading information like AsyncParentModules and the other fields down to Cyclic Module Records or even Abstract Module Records to support this. If that seems odd, this information could even be its own "table" which represents the in-progress load state. Happy to discuss this further / work on this as well, but I was just focusing on getting Source Text Modul Records right first!
I wanted to avoid the need to create a promise for every single module, as that just seems like wasted memory overhead to me. Just having promises for top-level jobs to ensure cycle resolution matches is all we need really. We could be populating a promise for every module, it's just wasted work that's all. If a module is in the "evaluating-async" state (including sync parents of async modules, which synchronously transition to this state when Evaluate runs), then the attachment to the completion is handled entirely by AsyncParentModules, which also gives us a well-defined execution transition from one completion to all possible parents.
Instantiate is effectively asynchronous as it happens while modules are fetching through HostResolveImportedModule I believe? As such, one slow module might hold up the tree, whereas another top-level evaluation job that shares a lower dependency could have already executed that dependency.
That would be great!
This patch only builds up AsyncParentModules when there are pending executions, so is actually the second case you describe and not the former case. Edit: although, yes we process the graph state immediately and synchronously that populated this so I see what you mean in it being upfront. The main reasons for this are to ensure well-defined cycle handling, since the CycleRoot is relative and defined by the top-level parent. Without cycles we could certainly not need any ownership model for executions. Finally, did you have any thoughts on ensuring we stop executions of further async modules in a cycle after any error in that cycles as in #74 (comment)? I'll gladly add a further commit to support this. |
Correction: instantiation appears to be a synchronous operation in the spec, under the assumption that another mechanism has already ensured the dependencies are available it seems, so that HostResolveImportedModule must be sync. If this is the case, we could certainly move some logic into instantiate, but please confirm this first? |
I would be interested to see that patch. I can't form an opinion until I understand better the semantics you are thinking of. Thanks for explaining the motivation for phrasing the patch as you did. As I explained above, there are several missing pieces of logic in the algorithms, and some logic should be moved from STMR to CMR to enable other module types, so I will try to write these up in a conservative way without those bigger changes. |
Ok, I've pushed a commit with support for stopping execution if any modules in the cycle have failed, which requires its own specific check to support. I believe the approach here is fully algorithmically correct. I tried to clarify above how ExecuteModule promise handling and sync modules depending on async modules are all well-defined, so if there's any way I can answer further or discuss on chat I'd be happy to do that. The lowering would be great to see though, so thanks for working on this. |
That last change LGTM. I suppose we could simplify that microscopically further because been only ever need to check the cycle root's evaluation error, not our own one, right? |
Some proposed edits at guybedford#3 . I think these are required for correctness. Find details in their commit messages. |
This reverts commit 8415c83. tc39/proposal-top-level-await#74 restores the property of Evaluate() that it always returns a Promise, saving some complexity in HTML.
@littledan great, I've commented on the PR there.
Unfortunately not, consider the example from #74 (comment), where D has completed execution, and B and E are in progress. If there is a failure on B while E is still in progress, it gets reported up the AsyncParentModules to the root, but because E is a leaf, it will still want to move to executing C unless we explicitly check the cycle root A for any error from other branches. |
(So we need to check both E and A's completions because E doesn't participate in the cycle, it is just a dependency of it) |
Landing this and will land guybedford#3 immediately afterwards. |
This reverts commit 8415c83. tc39/proposal-top-level-await#74 restores the property of Evaluate() that it always returns a Promise, saving some complexity in HTML.
This reverts commit 8415c83. tc39/proposal-top-level-await#74 restores the property of Evaluate() that it always returns a Promise, saving some complexity in HTML.
This reverts commit 8415c83. tc39/proposal-top-level-await#74 restores the property of Evaluate() that it always returns a Promise, saving some complexity in HTML.
This revises the parent execution ordering fix for #64 (previously #71), while also ensuring that cycles transition in a well-defined way.
This retains the parent chain execution semantics when an asynchronous dependency is added, without causing any execution ordering changes. For example with this approach, sync modules can provide polyfills that will execute before async dependencies in order. The spec diff has simplified down quite a lot in the process.
Consider for example:
main.js
dep.js
the log when importing main is
"dep", "main", "dep task"
.If we add a dependency to
dep.js
that does not use top-level await, we retain the"dep", "main", "dep task"
ordering.But if we add an async dependency to
dep.js
:dep.js
then the ordering becomes
"async-dep", "dep", "dep task", "main"
.The problem with the above is that the timing of the parent modules relative to eachother is changed by introducing an async dependency.
This PR resolves that so that the execution semantics of the parent ordering and timings remains the same, to provide the output
"async-dep", "dep", "main", "dep-task"
.This is achieved with global completion handlers on module execution that manage the graph state transitions.