Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

registry refactoring #96

Closed
dherman opened this issue Oct 23, 2015 · 14 comments
Closed

registry refactoring #96

dherman opened this issue Oct 23, 2015 · 14 comments

Comments

@dherman
Copy link

dherman commented Oct 23, 2015

Splitting out the registry from the loader as a separate object with a distinct class was a big step forward for separating concerns. But the registry abstraction is itself still muddled, so now it's time to turn our attention there.

Conceptually, the registry is just a mutable dictionary mapping string keys to modules (in whatever transitory state of loading). The twist is that a module's loading-state is based on promises, and should be managed in an asynchronous way; we don't want to be manipulating promises in ways that dip below the promise abstraction layer (such as fulfilling with final values that are thenable, or observing promises change state mid-task). But there are a number of use cases where we want to be able to synchronously interact with the registry:

  • Setting up polyfills in an execution phase that implicitly blocks later ones that depend on the polyfill. For example, installing modules in a polyfill with <script>/* set up registry */</script> ... <script type="module">import blah from "blah" ...</script> should allow the second script to depend on the registry modifications.
  • Iterating over the registry does not want to interleave with other turns, so it makes sense to be able to synchronously ask for the current set of keys.

This suggests that we should further separate the concerns of registration and loading status. In particular, the simplest, most general representation of the registry is effectively a map:

%%RegistryPrototype%%.keys() -> Iterator<string>
%%RegistryPrototype%%.values() -> Iterator<RegistryEntry>
%%RegistryPrototype%%.entries() -> Iterator<[string, RegistryEntry]>
%%RegistryPrototype%%.get(key) -> RegistryEntry
%%RegistryPrototype%%.set(key, entry)
%%RegistryPrototype%%.has(key) -> boolean
%%RegistryPrototype%%.delete(key)

The loading status of a registry entry, however, is primarily asynchronous and should be where the methods for querying and updating the loading status live:

get %%RegistryEntryPrototype%%.stage -> "fetch" or "translate" or "instantiate" or "satisfy" or "link" or "ready"
get %%RegistryEntryPrototype%%.module -> module object or null
%%RegistryEntryPrototype%%.load(stage) -> promise
%%RegistryEntryPrototype%%.result(stage) -> promise or undefined
%%RegistryEntryPrototype%%.resolve(stage, result)
%%RegistryEntryPrototype%%.reject(stage, error)
%%RegistryEntryPrototype%%.cancel(stage)

The only synchronous pieces are (a) the final module, which makes sense to make available as soon as it resolves -- for loaded modules you want to be able to synchronously get access to them; (b) the current stage, which won't change during the current turn (in particular because resolve/reject/cancel should implicitly call Promise.resolve on their argument); and (c) the fact that result may produce undefined if the requested pipeline stage has not been initiated or has been discarded.

(The cancel method should be deferred until promise cancellation is worked out.)

The difference between load and result is just that load initiates the loading process of a given stage (and its predecessors) if they haven't already, whereas result only asks for the promise if it happens to already exist and gets undefined otherwise.

We will also need a way of constructing new RegistryEntry objects from scratch, so that you can register a module in any loading state you want. Still need to sketch out an API for that.

We may want to rename RegistryEntry since it's competing with .entries for nomenclature. Perhaps ModuleStatus?

Note a few important properties:

  • Deleting or updating a registry entry immediately affects it, even if there is a pre-existing entry with a pending load. You can independently decide whether to cancel that pending load by calling the methods of the entry. This is a more general semantics than implicitly cancelling a pre-existing load.
  • A failed module load should not implicitly unregister itself. Again, these are separate concerns, but it also prevents implicit retries in the semantics.
  • Because a registration can be modified, the loading pipeline semantics should look up an entry for a given key just once and then all subsequent operations in the pipeline should close over that entry. It should not look up the same key multiple times, which would be an unnecessary race condition.
@guybedford
Copy link

A failed module load should not implicitly unregister itself. Again, these are separate concerns, but it also prevents implicit retries in the semantics.

Does this mean that the following code:

System.import('x')
.catch(function() {
  // retry the import
  System.import('x');
})

would not be possible, and would have picked up the "cached" errored entry? It can be useful for the failure to be removed to allow things like this, after the failure signal has fully propagated through the dependency tree.

@caridy
Copy link
Contributor

caridy commented Oct 30, 2015

@guybedford that is correct. Any successive call to get module x ready, will fail, unless you take action on it. That is a side effect of the previous PR I think, and I hope we can correct that somehow. The detail here is that it is not necessary about x, but any dependency of it might fail, and you have no idea which one was it so you can remove it from the registry, this is very limiting. Thanks for bringing this up, I will work with @dherman to get this resolved.

@dherman
Copy link
Author

dherman commented Oct 30, 2015

There's no question that retries can be useful, and this doesn't eliminate the ability to do them. It only prevents them from happening automatically. The problem with automatic retries is that they will result in hiding errors and hitting the network repeatedly by accident. The system shouldn't try to be too clever and guess what the programmer wants. Retries absolutely should be and are expressible, they just shouldn't be done silently.

For example, your code above is still achievable via e.g.:

System.import('x').catch(() => {
  System.resolve('x').then(key => {
    System.registry.delete(key);
    System.import('x');
  });
});

@dherman
Copy link
Author

dherman commented Oct 30, 2015

Ah sorry, yes, I see the missing functionality: we need to make it possible to retry the subgraph. This probably calls for at least a guarantee that the error provides enough information to delete all the entries for a failed subgraph, but it might also call for an explicit retry mechanism.

@guybedford
Copy link

What we have in SystemJS today, and from the previous specification work, is that everything that was listening to the current load record on failure gets failed. But once the failure has been communicated to all the listening modules and imports, those modules are cleared allowing retries by default.

This behaviour has turned out to be very useful, and hasn't resulted in unnecessary retries as users will only repeat a System.import call when they truly expect to send the request again. I'm not sure I see what benefit is provided by forcing users to step lower into the API to handle these cases.

@dherman
Copy link
Author

dherman commented Oct 30, 2015

IINM the previous version of the specification never explicitly deletes entries from a registry, and certainly does commit them to the registry at a certain point. So for example, if a module's top level code throws, your code snippet would not work at all, even in the previous system.

The fact is, there is no one-size-fits-all retry mechanism that makes sense as the default.

The system needs to accommodate multiple registries, and multiple registries need to be able to share entries. This means that the dependency graph of a module and the contents of the registry may not be the same thing: for example, if you start loading an entry with loader1, and then store that entry in loader2, none of the entry's dependency graph shows up in loader2's registry at all. So when the entry fails, what should be deleted? Should we delete it from any registry it appears in at all? Should we do this transitively for any modules that depend on it, walking back-pointers into all registries and deleting it?

It's not hard to construct use cases that want different retry policies. It depends on the relationship between loaders. In one scenario, loader2 might be using loader1's entries as a quiet optimization, but if there are any problems it does not want to modify the state of loader1's registry and just reload locally into its own registry. In another scenario, loader1 and loader2 may be kept closely in sync with one another, and a failure in either of them should result in a retry in both.

It's too soon to try to build in a universal retry policy. Instead, we need the API to provide the building blocks needed to implement any retry policy, and then people can design abstractions for the most useful policies.

In particular, here are the building blocks we need:

  • every entry needs a flag indicated whether it failed
  • every entry needs to be able to query information about its immediate dependencies
  • that introspection API should provide all the identifying information about the module that came out of the loading pipeline (the requested module names, like "../foo/bar"; the resolved keys for those names; and the entry objects for those keys), so that no matter what the custom logic of the loader is, it has the ability to reconstruct the places where it may have installed the module in its registry

With this, it's possible to walk the dependency graph and find all failed modules and remove them from wherever the loader may have installed them in its registry (or possibly other registries it may control).

I'm not sure I see what benefit is provided by forcing users to step lower into the API to handle these cases.

Users don't have to if the loader subclass or framework provides a higher-level abstraction built on top of the building blocks this standard provides.

Automatic retries are the kind of mechanism that gets web standards into trouble: constraining a general-purpose API with ad hoc, scenario-driven policies. While it might work out to be more convenient for the situations we anticipate, the loss of generality disempowers programmers from being able to achieve the appropriate policies for their purposes. The extensible web philosophy prefers simple core semantics with a complete low-level API on which userland frameworks and libraries can build more and more convenient abstractions.

@guybedford
Copy link

Thanks for taking the time to put together a considered response, it really helps to hear the decisions here from reading the PRs! Yes, modules were only set in the registry on instantiate completion, so there was nothing to remove from the loader registry. But the in progress load records that had failed were traced and removed from the list of current in-progress load records of the loader.

This way network / loader hook errors were naturally retriable, while code execution errors retained a failed module in the registry, whose error would cache for any module trying to import it.

Certainly I'm completely behind the extensible web stance, but extensibility is a powerful tool that should still be carefully provided where additional innovation is truly possible - I'm just trying to convince myself that there are other retry policies that make sense to explore at this level of the architecture and am unable to easily picture any examples (there are certainly other interesting levels at which network retry policies can be implemented - eg normalization or fetch hook based).

I wasn't aware of the multiple registry constraint - when sharing modules between registries, would it not be possible to limit this to module records, instead of module loading / status records? (apologies if I'm messing up all the terminology here too, still catching up on the new terms!)

@dherman
Copy link
Author

dherman commented Oct 31, 2015

You have to put entries in the registry as soon as they start the pipeline in order to ensure that concurrent imports that require a common dependency do not clobber each other.

@wycats
Copy link

wycats commented Oct 31, 2015

Certainly I'm completely behind the extensible web stance, but extensibility is a powerful tool that should still be carefully provided where additional innovation is truly possible.

The whole point of extensibility is that we can't guess or predict all of the use-cases ahead of time. I have enough scars from AppCache (read this if you want to know more) that I think we should always be humble about use-cases we haven't thought of yet.

That said, we should also have high-level APIs that cover the common cases, and we should provide them as soon as we've developed and refined them. But the low-level APIs, which let the use-cases evolve over time, and which let us learn about common use-cases from library authors, come first.

@guybedford
Copy link

In this case, what we're implementing by default here is failure persistence into the registry, and making retry behaviour more difficult.

The default that I'm advocating would be:

  • On failure, if the current module is not yet instantiated, then remove the failure from the registry
  • Otherwise, if the current module is post-instantiate, leave it in the registry as a failed load.

With the above, code execution errors persist, while 400 and 500 HTTP errors, and other hook errors retry.

My experience has been that this is the most desirable behaviour to allow easy retrying. Consider forgetting to save a module in NodeJS and getting an fs read error, then trying to import again to get the module - this is the behaviour that is implemented today in NodeJS.

The only worry from the spec perspective in implementing a default like the above is if there are use cases where it would be desireable for loader hook errors to be persisted into the registry. I'm unable to think of a use case where this would be useful myself.

@caridy
Copy link
Contributor

caridy commented Nov 2, 2015

@guybedford "metrics" is probably a good use-case, e.g.: System.loader.import('app'), if it fails, I want to know why it has failed.

If you remove the 404s from the corresponding registry, what does that give you? Iterating over the registry entries is really not that reliable when it comes to identify failures, because those failures might come from another loader, with a different registry. In other words, eliminating things from the registry is really not intrusive, we keep references around internally so you can navigate the dependency graph, independently of the registry.

@bmeck
Copy link

bmeck commented Nov 30, 2015

A reminder that synchronous loaders like node can not rely on Promises for the registry state lookups.

@caridy
Copy link
Contributor

caridy commented Nov 30, 2015

@bmeck you can check the details here: #97

all interaction with the registry is sync.

caridy added a commit that referenced this issue Dec 1, 2015
…ry-objects

fixes issue #96: registry refactor to introduce registry entry objects
@dherman
Copy link
Author

dherman commented Dec 1, 2015

Fixed in 1e2cb5e.

@dherman dherman closed this as completed Dec 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants