-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
registry refactoring #96
Comments
Does this mean that the following code: System.import('x')
.catch(function() {
// retry the import
System.import('x');
}) would not be possible, and would have picked up the "cached" errored entry? It can be useful for the failure to be removed to allow things like this, after the failure signal has fully propagated through the dependency tree. |
@guybedford that is correct. Any successive call to get module |
There's no question that retries can be useful, and this doesn't eliminate the ability to do them. It only prevents them from happening automatically. The problem with automatic retries is that they will result in hiding errors and hitting the network repeatedly by accident. The system shouldn't try to be too clever and guess what the programmer wants. Retries absolutely should be and are expressible, they just shouldn't be done silently. For example, your code above is still achievable via e.g.: System.import('x').catch(() => {
System.resolve('x').then(key => {
System.registry.delete(key);
System.import('x');
});
}); |
Ah sorry, yes, I see the missing functionality: we need to make it possible to retry the subgraph. This probably calls for at least a guarantee that the error provides enough information to delete all the entries for a failed subgraph, but it might also call for an explicit retry mechanism. |
What we have in SystemJS today, and from the previous specification work, is that everything that was listening to the current load record on failure gets failed. But once the failure has been communicated to all the listening modules and imports, those modules are cleared allowing retries by default. This behaviour has turned out to be very useful, and hasn't resulted in unnecessary retries as users will only repeat a |
IINM the previous version of the specification never explicitly deletes entries from a registry, and certainly does commit them to the registry at a certain point. So for example, if a module's top level code throws, your code snippet would not work at all, even in the previous system. The fact is, there is no one-size-fits-all retry mechanism that makes sense as the default. The system needs to accommodate multiple registries, and multiple registries need to be able to share entries. This means that the dependency graph of a module and the contents of the registry may not be the same thing: for example, if you start loading an entry with loader1, and then store that entry in loader2, none of the entry's dependency graph shows up in loader2's registry at all. So when the entry fails, what should be deleted? Should we delete it from any registry it appears in at all? Should we do this transitively for any modules that depend on it, walking back-pointers into all registries and deleting it? It's not hard to construct use cases that want different retry policies. It depends on the relationship between loaders. In one scenario, loader2 might be using loader1's entries as a quiet optimization, but if there are any problems it does not want to modify the state of loader1's registry and just reload locally into its own registry. In another scenario, loader1 and loader2 may be kept closely in sync with one another, and a failure in either of them should result in a retry in both. It's too soon to try to build in a universal retry policy. Instead, we need the API to provide the building blocks needed to implement any retry policy, and then people can design abstractions for the most useful policies. In particular, here are the building blocks we need:
With this, it's possible to walk the dependency graph and find all failed modules and remove them from wherever the loader may have installed them in its registry (or possibly other registries it may control).
Users don't have to if the loader subclass or framework provides a higher-level abstraction built on top of the building blocks this standard provides. Automatic retries are the kind of mechanism that gets web standards into trouble: constraining a general-purpose API with ad hoc, scenario-driven policies. While it might work out to be more convenient for the situations we anticipate, the loss of generality disempowers programmers from being able to achieve the appropriate policies for their purposes. The extensible web philosophy prefers simple core semantics with a complete low-level API on which userland frameworks and libraries can build more and more convenient abstractions. |
Thanks for taking the time to put together a considered response, it really helps to hear the decisions here from reading the PRs! Yes, modules were only set in the registry on instantiate completion, so there was nothing to remove from the loader registry. But the in progress load records that had failed were traced and removed from the list of current in-progress load records of the loader. This way network / loader hook errors were naturally retriable, while code execution errors retained a failed module in the registry, whose error would cache for any module trying to import it. Certainly I'm completely behind the extensible web stance, but extensibility is a powerful tool that should still be carefully provided where additional innovation is truly possible - I'm just trying to convince myself that there are other retry policies that make sense to explore at this level of the architecture and am unable to easily picture any examples (there are certainly other interesting levels at which network retry policies can be implemented - eg normalization or fetch hook based). I wasn't aware of the multiple registry constraint - when sharing modules between registries, would it not be possible to limit this to module records, instead of module loading / status records? (apologies if I'm messing up all the terminology here too, still catching up on the new terms!) |
You have to put entries in the registry as soon as they start the pipeline in order to ensure that concurrent imports that require a common dependency do not clobber each other. |
The whole point of extensibility is that we can't guess or predict all of the use-cases ahead of time. I have enough scars from AppCache (read this if you want to know more) that I think we should always be humble about use-cases we haven't thought of yet. That said, we should also have high-level APIs that cover the common cases, and we should provide them as soon as we've developed and refined them. But the low-level APIs, which let the use-cases evolve over time, and which let us learn about common use-cases from library authors, come first. |
In this case, what we're implementing by default here is failure persistence into the registry, and making retry behaviour more difficult. The default that I'm advocating would be:
With the above, code execution errors persist, while 400 and 500 HTTP errors, and other hook errors retry. My experience has been that this is the most desirable behaviour to allow easy retrying. Consider forgetting to save a module in NodeJS and getting an fs read error, then trying to import again to get the module - this is the behaviour that is implemented today in NodeJS. The only worry from the spec perspective in implementing a default like the above is if there are use cases where it would be desireable for loader hook errors to be persisted into the registry. I'm unable to think of a use case where this would be useful myself. |
@guybedford "metrics" is probably a good use-case, e.g.: If you remove the 404s from the corresponding registry, what does that give you? Iterating over the registry entries is really not that reliable when it comes to identify failures, because those failures might come from another loader, with a different registry. In other words, eliminating things from the registry is really not intrusive, we keep references around internally so you can navigate the dependency graph, independently of the registry. |
A reminder that synchronous loaders like node can not rely on Promises for the registry state lookups. |
…ry-objects fixes issue #96: registry refactor to introduce registry entry objects
Fixed in 1e2cb5e. |
Splitting out the registry from the loader as a separate object with a distinct class was a big step forward for separating concerns. But the registry abstraction is itself still muddled, so now it's time to turn our attention there.
Conceptually, the registry is just a mutable dictionary mapping string keys to modules (in whatever transitory state of loading). The twist is that a module's loading-state is based on promises, and should be managed in an asynchronous way; we don't want to be manipulating promises in ways that dip below the promise abstraction layer (such as fulfilling with final values that are thenable, or observing promises change state mid-task). But there are a number of use cases where we want to be able to synchronously interact with the registry:
<script>/* set up registry */</script> ... <script type="module">import blah from "blah" ...</script>
should allow the second script to depend on the registry modifications.This suggests that we should further separate the concerns of registration and loading status. In particular, the simplest, most general representation of the registry is effectively a map:
The loading status of a registry entry, however, is primarily asynchronous and should be where the methods for querying and updating the loading status live:
The only synchronous pieces are (a) the final module, which makes sense to make available as soon as it resolves -- for loaded modules you want to be able to synchronously get access to them; (b) the current stage, which won't change during the current turn (in particular because
resolve
/reject
/cancel
should implicitly callPromise.resolve
on their argument); and (c) the fact thatresult
may produce undefined if the requested pipeline stage has not been initiated or has been discarded.(The
cancel
method should be deferred until promise cancellation is worked out.)The difference between
load
andresult
is just thatload
initiates the loading process of a given stage (and its predecessors) if they haven't already, whereasresult
only asks for the promise if it happens to already exist and getsundefined
otherwise.We will also need a way of constructing new RegistryEntry objects from scratch, so that you can register a module in any loading state you want. Still need to sketch out an API for that.
We may want to rename RegistryEntry since it's competing with
.entries
for nomenclature. Perhaps ModuleStatus?Note a few important properties:
The text was updated successfully, but these errors were encountered: