-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why a locate hook is unnecessary #52
Comments
One caveat to this - we still allow special modules to be stored in the registry like Also to address the plugin argument - |
This is all predicated on something that I don't understand:
Where was it discussed that this was needed or desirable? Is there a link you can provide? I must have missed this discussion. I'll refrain from commenting on it until I read the arguments. It sounds like a "custom schema" is a module name and Look-up becomes much less useful without module names. How can I lookup I'm going to have to always run For example, in some situations you want to create distinct modules based on who is importing them. With module names I can create a pattern like: import register from 'register'; and this normalizes to
Where Your No doubt the loss of normalize can be worked around but the question is why should it be given there is already a specced solution in place. The assumption of dropping normalize is that modules always map directly to files, but that is not the case. You cannot do something like: import data from 'package.json!json';
import npmData from 'package.json!npm'; These are distinct modules that have different purposes, but happen to be locating the same file. This means you have to create a custom schema (a module name format) to handle this. You'll have to do this any time you have modules that don't map directly to files; you have to hook both resolve and fetch. How are you going to handle fetch, with your plugin example? You're going to have to write something like: var fetch = loader.fetch;
loader.fetch = function(key){
// http://www.site.com/plugin.js!http://www.site.com/module.js
var url = key.split("!")[1];
return fetch.call(this, url);
}; Except now all fetch hooks will be receiving the key "http://www.site.com/module.js" which is not the module's actual key. Hopefully those hooks aren't expecting to have the real key (to store in localstorage, or call lookup later or something) because they don't have it. |
I want to reiterate that last point so that it doesn't get lost: If there is a need for keys that do not map directly to the url that is fetched it means that any |
@matthewp thanks for the quick response. I've got this work running on SystemJS master so feel free to play around with it there and see how it all works out too. From what I've heard, it sounds like there is consensus that designing a loader for the web that does not support loading URLs is really not something that can be considered. URLs are fundamental to the web. I don't have any links, just heard notes from a meeting (quite some time ago) where it was decided that the automatic adding of a .js extension is untenable as a default system that does not support URLs, and that importing from URLs should be possible. Using URLs does not inhibit contextual normalization at all - I mentioned the plugin pattern in my second comment - plugin systems can still extend URLs to include extra syntax for plugins just fine. I'm not sure I follow your final argument, which seems to rest on an assumption about the fetch hook that has never been formalized, what use case exactly is affected by this? |
Ah, this is probably the crux of the problem then. If you think about a module identifier as being a pointer to a file then I can see how this requirement would surface. Hopefully we can have this debate again now that the spec is being worked on in the open. I would suggest a mailing list instead of GitHub issues as it's easy to get drowned out by notifications and miss stuff here. Mailing lists are better for discussion and debate.
Yes, my assumption is that the fetch hook does an xhr request for the key (which is a url). I can't see it being implemented any other way. If you were to override the fetch hook because you have a custom url schema (such as with a plugin system) you will be calling super() with a url that is not the module key. This breaks an important assumption when overriding a hook; that you have access to the module's true key. |
|
I can see a loader design that perhaps does not specify a I still think it makes sense for the ID keys for modules internal to the loader to be module IDs instead of URLs though. Modules are units of functionality, like functions. The module loader is like a scope object for those code units, and the identifiers for those code units should just be something like an identifier, not a URL. For module bundling cases and some plugin loader resource IDs that are not URLs, the provider may not be at an actual URL. Best to keep the IDs separate from URLs to enforce that conceptual distinction. As to the dot normalization for CJS/AMD IDs, this exists because objects should not name themselves in source form. So sub-pieces inside a package need to be able to reference other parts in that package. Instead of seeing the module ID parts as a file path/URL, see it as referencing different parts of an object with a nested property/object structure. There is a package that provides Using '/' instead of '.' for the parts in a whole package object definition makes it easy to use '../' or './' without knowing the global object name or needing to to invent a new syntax that does not conflict with '.', if '.' was instead used for separating object parts (so It also turns out to be helpful for a simple ID-to-path conversion, if a file fetch is needed. That just helps reduce the number of indirection layers, so creates a simpler overall system, but not the primary goal of the IDs. If the argument is that not being able to use URLs for IDs as something that is not fundamentally of the web, I think this is assuming too much on the JS module system related to the old world of script src tags. As illustrated above, modules IDs are a replacement for Perhaps the "URLs must be supported" comes from thinking that script tags should be usable in an ES module system, but I think that is not the best way to look at script src tags in a module system. Some notes on that: Script src tags are separate primitives for loading script and evaluating it. Consider it an XHR call with an eval() in global script space, and extra semantics for blocking rendering. Dependencies are expressed via a linear global list of script src tags. In contrast, modules are about units of code that specify their dependencies not with an implicit global identifier but with a local identifier tied to a string reference, where that string reference allows some level of indirection and resolution. Sometimes that indirection means translating that ID to a URL to load, sometimes it means just grabbing that module definition by the normalized string from a registry populated by bundles modules. Modules are a tree structure that (to work in the browser) are asynchronously discovered. That async discovery of layers in a tree model fundamentally does not work well with the linear, and sometime render-blocking, model of script src tags. There is a set of problems when using an AMD loader alongside manually coded script src tags that are just avoided by not using the script src tags at all. The conceptual models are just different, and there will be less confusion by keeping them separate. |
@jrburke Your last post is a bit verbose. Just say everything once, please. Part 1
Fetch does nothing more but loading.
It is an identifier. (What else would it be?) Also, one of the the most frequently used ones.
It might not be a web server. But URLs work for all kinds of sources (e.g. android content providers). Just use a custom schema. Part 2 Part 3 |
I meant more like the JS language concept of Identifier. Sorry for not making that clear.
URLs are great for global systems, like addressing content across an OS, that need disambiguation, android content providers being one of those. However, module use is more about local context, and it is more useful to have names that represent contracts than actual URLs. My 'jquery' could actually be provided by 'zepto'. It is good if a dependency indicates via a globally resolvable identifier (like a URL) in the package.json where to fetch a a local identifier dependency if the local project does not have an opinion, but also allow the local project to override that with a preferred local value. There is a very strong analogy with custom elements in HTML, where they are tag names, but the resolution of that name to provider is separate from that name, and each tag may not be delivered by individual URLs.
This was in response to @guybedford's original post section on Dot Normalization. The duplication is not really about a path duplication, but about being able to reference different concept parts in a localized way, without knowing the global ID, and if it looks like duplication, it is more about flattening the number of transforms from ID to path (when a path is required for fetching) than being specifically a URL concept.
They are, because they help disambiguate across a global space, across all web sites. It does not follow though that it is therefore what a module loader needs to use as internal keys. And given the use cases of module bundling, loader plugins IDs that are not URLs, and even other examples in the web space, like custom elements, it is much more flexible to keep the URL concept solely related to when fetching is needed, when global disambiguation is needed (like, the browser cache is global across all web sites, so a URL is needed in that case). |
Is every module registered the result of fetch()? So that every entry in the registry corresponds to a URL used in fetch()? Not using this equivalence would seem to make implementation more confusing. Doesn't the issue of "module ids" vs URLs relate to module specifiers rather than the registry keys? |
No, the |
@jrburke In this case |
Well URLs do not identify files so why does "modules are not all files" relate? |
They do identify a resource that can be fetched, though, I thought this is why you asked the question. If all modules relate to a URL it might make sense (in some aspects) that they be the key, but since they do not all relate to a URL it makes much less sense. |
If the developer who calls
This does not mean the keys must be URLs, it just means that the keys should represent the modules not the specifiers or a mapping mechanism. As @jrburke points out the arguments about URLs being fundamental etc don't really make sense. But URLs are approximately unique and approximately represent content. We can't actually do better than approximate here because the content of module definitions can change at arbitrary times on servers far away. We have a lot of experience with how URLs fall short. We have tools to deal with them. I guess the handful of folks who deal with this level of the API will able to deal with URLs by just treating them as strings even if they have some drawbacks (verbose, obtuse, vague). (and I'm not even sure how this related to |
To reiterate my main point, then to hopefully stop posting in this ticket: URLs are best for when things really need to be uniquely disambiguated in a global space. Browser cache across multiple contexts (pages) is one of those. Package managers installing code from a global space into a local context is another. In the package manager case, it is specified outside the module loader system, in package metadata, only comes to play on initial install to local project context. For module keys, I do not believe URLs are the right model. Better ones are the function Identifiers used in JS, or custom element IDs, both conceptual names that need to be unique within a scope. They do not need URLs to work -- the fetching (if needed) of the code that backs those IDs are a separate concern. Strings are used to refer to modules instead of JS Identifiers because the IDs need some expressiveness outside regular JS Identifier punctuation, and it helps indicate that these are special in their resolution, do not require that the value at that identifier is completely defined when they first are encountered. Similarity to a string path or URL in some cases are just for flattening some of the translation logic if fetching is needed, but fetching is conceptually a separate concern from modules' unique IDs in a loader instance. |
@jrburke no one is arguing that portable names are not desirable. Of course they are. The argument is simply that storing portable names in the registry itself is not a good idea. To summarize my initial argument:
To summarize my proposal use portable names, but normalize them into URLs for the environment. This is completely working from bundling to loading in SystemJS and jspm now, being released as of tomorrow. |
I believe this the part is causing the trouble. I did that too in requirejs, and it led to bad expectations, particularly around build bundling and trying to reference IDs outside the top of the namespace, then expecting that to be addressable as a module later for bundling. I assumed I needed to do it to allow people to transition from script src tags, but it was the wrong choice, script src tags are really a different thing than modules, and where I can make new AMD loaders that break compatibility with requirejs, I will just enforce using portable names. So I suggest whatever reasons might be driving wanting to load URLs directly as a module for your case should be revisited. It would be great if those reasons were illustrated somewhere, maybe I can learn from them, but based on previous thread comments it seems like it is not readily available. |
@jrburke exactly, it is these difficulties trying to make URLs and names work together that leads to my argument that we store URLs in the registry. This entirely comes from wanting URLs to be permissible. I believe this is a pretty set requirement and I think the point is that we can't "ignore" URLs when writing for the web, but @dherman or @caridy may have better arguments for the decision here. Note that it is very simple given a URL to work out its portable name again for bundling for example. |
@guybedford you said this has been implemented in SystemJS, where? master doesn't appear this way. |
This restricts the usage of |
Ok @guybedford so I understand how your new code works now. So couple of problems I spot:
You can't use plugins that use an extension, for example system-npm, you couldn't do
This won't work any more once there's only a |
@guybedford You didn't explain how you plan to make URLs portable. Can you explain this? Looking at your build code it looks like you attempt to do a reverse normalization: https://github.com/systemjs/builder/blob/67754eee48f3604233fd9ad005240801f693d9c2/lib/builder.js#L120 This assumes knowledge of how the original normalization took place and won't work otherwise. Correct me if I'm wrong, but I think the only way to make a truly portable schema would be to have something like normalize take place. There are still unanswered questions about making the Loader spec fast enough that http2 is viable, so I don't think we can simply ignore the bundling need. I'd love to hear how portability can be achieved without normalize though. |
@matthewp to quote the original post:
That is, yes exactly, we reverse from URLs into our portable schema at build time, and have bundle names normalized by the loader. The alternative is to effectively do this in the loader itself, but the issue as I mentioned is that we convert the schema to URL back to schema in normalize just to handle the URL normalization problems, which doubles up work unnecessarily. Hence the argument URLs as the environment-reference, which can easily be converted back and fourth as needed but the point being as needed. |
You cannot reverse it though. There's no algorithm for doing so. It works in your case because you are making assumptions about how normalization originally occurred. This is not sufficient for generating loader-agnostic portable names. Normalize is. |
Yes any type of schema names are specific to the implementation in question, and the responsibility of the system imposing the schema. The rule in jspm is currently simply SystemJS wildcard paths configuration. The argument is not about what is necessary for portable names (and with a working implementation I I've shown URLs in the registry work completely fine), it's about what is necessary to permit URLs and schemas together in a module loader. |
The argument is about whether a locate hook is necessary. If the solution for creating portable names requires something like normalize then I think that's a strong argument that we should just keep normalize. Your working implementation makes unreasonable assumptions that won't scale. |
Just to clarify again here from the last two comments for the record, the argument is simply:
Of course the locate hook can be allowed and implemented, in turn allowing reverse-normalizing from URL-space back into schema-space at the end of normalize for custom schemas to be used in the registry. But we'd need a good reason to do this as it is unnecessary work otherwise. So in SystemJS we're moving along the path of not having locate to test this out properly. If we hit an issue that shows this to be a terrible idea we can reconsider, but that is yet to happen so far. |
Well so far you had to gimp your plugin system and kill your extension ecosystem. Maybe you don't think that counts as a "terrible idea" but I certainly do. Your builder now won't work if the user is using custom extensions (maybe it never did, but ours did). It is very easy to make a build tool that works with any WhatWG Loader if you have normalize. Without it it's not, since the whole point of this change is to remove the ability to identify modules outside of what loader they are running in. By the way, SystemJS is still using the locate hook such as here. When you finally do get rid of locate you're going to run into the problem described here with plugins since you're going to have to call fech with a non-key in that case (or any case where you want to fetch a module whose key is not a real url). |
@matthewp what is a use case for fetching a module whose key is not a real url? For things like core modules, we can permit non-URL names from normalize via the first line of normalize being: function resolve(key, parent) {
if (loader.has(key))
return key;
} That enables things like core modules like |
plugin resources. |
upgrade systemjs-builder to 0.14.x BREAKING CHANGE: URLs are now first-class module names. All names are normalized into URLs into the registry as part of the normalization process. See https://github.com/systemjs/systemjs/releases/tag/0.17.0 and whatwg/loader#52 for more details.
@guybedford can we just close this now that we have merged PR #97? |
@caridy sure, and thanks for your excellent work on this. |
In creating the upgrade path in SystemJS for permitting URLs as module identifiers, it has turned out best to deprecate the locate hook. This may be over-explaining the obvious or dwelling on decisions already made, but coming to this conclusion has taken me a surprising amount of consideration so I'd like to describe the reasoning behind this here to retain some reference for the decision and attempt to leave somewhat reasoned feedback.
The question that started this was whether we should re-introduce the locate hook into the specification. Re-introducing the locate hook will enable normalize to normalize module names into a custom schema that can be defined by the loader implementation and form the string names that are stored in the module registry. Locate then handles the final resolution into URLs that can be fetched.
The justification for considering this was to retain compatibility with AMD-style module loading where we have a baseURL-schema. In this schema, modules names are stored in the registry always as plain names relative to some baseURL. jspm also uses its own schema in the registry to refer to modules such as
npm:[email protected]
.There is a draw to having the sense of storing these universal schema names inside of the module registry as a portable naming system but I'd argue this lure is mostly one of elegance as opposed to practicality.
I've implemented the baseURL-schema normalization in the current SystemJS, and have been experimenting recently with at least three different complete implementations of normalization of a custom schema alongside URLs (the new requirements of the spec, which completely make sense).
In the end, trying to make a custom schema work alongside URLs in the same registry space, ends up causing more issues, for no practical gain.
Dot Normalization
As soon as we allow both AMD-style module IDs relative to some baseURL alongside URLs, the first issue we hit is the need to define "dot normalization". This basically means that relative normalization needs to be defined for the subset of both URLs and non-URLs.
It's not a lot of code, but it is the first sign here that we're duplicating work.
Non-uniqueness
The next issue we have is the non-uniqueness of our schema. This issue here is that
import '/local/path.js'
is now distinct and separate to the module atimport 'local/path.js'
in the scenario wherebaseURL='/'
(one resolves as a name and the other as a URL). This will cause confusion as we are allowing the same unique module to be referred to by two different possible names breaking a key principle of the registry being unique.Having two ways to refer to the same module is a bug waiting to happen, causing problems for configuration (which variation do we configure?), creating the possibility of a module being executed twice, and interfering with bundling workflows.
Expecting the user to know that they should write
import('x')
instead ofimport('./x')
arbitrarily is a hard ask.This leads down a road of trying to catch these uniqueness issues in the normalization pipeline itself, which then ends up becoming URL normalization, followed by a reverse normalization into the schema.
All schemas have the non-uniques problem
Schema non-uniqueness with URLs applies to any custom schema chosen that maps to URLs, not just the baseURL system. Even if we come up with the perfect custom naming schema, as soon as we want that schema to co-exist alongside URL requests we hit these issues.
In order to retain unique identification and configuration of modules, one ends up normalizing from schema space into URL space, and then reverse-normalizing back into schema space at the end of normalize, before resolving back into URLs from the schema in locate, just in order to have our perfect schema names stored in the registry.
Add to this the idea of a configuration space consisting of both schema and URL identifiers as well, and this compounds the problem even further.
One ends up swapping between spaces in such a way that URLs become the primary space anyway, and we're just pretending that the schema is the primary space.
Beyond the baseURL-schema
Another common issue with baseURLs is that when back-tracking below the baseURL, we end up with "normalized" paths looking like "../../module.js", which is really not acceptable for a naming system either.
If we return to the question of what AMD's baseURL schema is really trying to accomplish, the core principle is one of portability of modules, which is completely in agreement with what we should be aiming for. URLs are obviously not a portable naming system for modules (modules can move between environments and hence change URL), so the question is simply how to maintain portability of modules in spite of using URLs?
URLs are the schema
It turns out to be very simple to do this - normalization is seen as the process of converting a "portable module name" into an "environment-specific name". And the most environment-specific name is the URL which we store in the module registry.
The concept that we need to have a registry based on our perfect portable schema is flawed. We still keep our schema if we like - which we can bundle into just the same:
Where the name above name is normalized into a resolved name of
http://www.site.com/packages/custom/portable/schema.js
by the loader when being processed and stored in the registry (bundle names are now treated as unnormalized).There is no big loss that the registry now contains this value under an environment-specific URL instead of the schema. One can just accept that any lookup into the registry must pass through a normalization phase first:
If an implementor really wants to use a custom schema, make the schema URL-based and add the implementation to the fetch hook so everything works out well anyway:
The other consequence of using URLs is that configuration then always goes through a normalization phase itself:
The above would normalize the above configuration into
http://site.com/local/path/some/local/module.js
.The benefit of this is that users don't need to understand the special naming schema - they can just reference modules as URLs exactly as they expect and correctly configure things without needing to have studied the system in detail.
One implication here for implementors is that build systems wanting to use portable naming system schemas need to reverse-map the schemas at build time from URLs in the registry, but that is a very minimal cost and a straightforward 1-1 mapping.
I've yet to hear a single use case that is lost by enforcing that the registry is only to store URLs - the justifications for allowing the registry to store a custom schema seem to cling to dated models due to history, while there are many benefits as described to both implementors and users in enforcing URLs as the schema and keeping the locate hook deprecated.
The text was updated successfully, but these errors were encountered: