Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update loader module name resolver #1221

Open
arv opened this issue Aug 1, 2014 · 49 comments
Open

Update loader module name resolver #1221

arv opened this issue Aug 1, 2014 · 49 comments
Labels

Comments

@arv
Copy link
Collaborator

arv commented Aug 1, 2014

If the module name starts with one of:

  • ./
  • ../
  • /
  • scheme:

The name is treated as a relative (or absolute) URL from the current module. This means that we no longer have an implicit .js. It also means that the URL is relative to the file/URL of the current module and not relative to the module name.

@johnjbarton @guybedford

@arv arv added the bug label Aug 1, 2014
@guybedford
Copy link
Contributor

Is this the behaviour currently or the proposed behaviour?

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

This is the new simplified behavior we decided upon.

@guybedford
Copy link
Contributor

Oh wow! I like it.

@zenparsing
Copy link

Better! In what context was this decided? TC39, traceur-specific, etc...

@guybedford
Copy link
Contributor

Does that mean:

System.import('jquery.js') // -> http://cdn.jquery.com/jquery.js

jquery.js

import './sizzle.js';

// ./sizzle -> http://cdn.jquery.com/sizzle.js

Then the registry would look like:

{
  'jquery.js': ...
  'http://cdn.jquery.com/sizzle.js': ...
}

Is this not mixing module names and URLs? Do we not need to treat all normalized module names as normalized URLs in this context?

@guybedford
Copy link
Contributor

This doesn't seem fully consistent to me at first glance. I'm not sure what distinction remains between module names and addresses. I would suggest module names be allowed to be based on any custom absolute URI form and schema, while addresses be absolute URIs suitable for the platform. In such a scenario, module names being relative to module names would still make sense, so I'm not sure why that is being provided as the key argument.

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

@zenparsing Break out session at TC39 which means that it has not yet been approved. However, the module champions were positive to this improvement.

@guybedford Yes and no. Let me see if I can make this clear.

System.import('http://cdn.jquery.com/jquery.js');

// jquery.js
import './sizzle';

This would most likely 404 since it would try to load http://cdn.jquery.com/sizzle and not http://cdn.jquery.com/sizzle.js. To fix that use:

System.import('http://cdn.jquery.com/jquery.js');

// jquery.js
import './sizzle.js';

The registry would not use the module name as the key but the URL:

{
  'http://cdn.jquery.com/jquery.js': ...
  'http://cdn.jquery.com/sizzle.js': ...
}

So what happens when you do import 'jquery'. Out of the box it will be an error since bare names are not resolved to a URL. So you need to add a mapping from jquery to the URL it resides at. This can be done using one of the hooks or a mapping table.

@guybedford
Copy link
Contributor

@arv that makes sense, but consider -

System.import('/lib/jquery.js'); // -> http://thissite.com/lib/jquery.js

If this is stored in the registry as /lib/jquery.js, it is inconsistent as an import of ./jquery.js within /lib/app.js will resolve to http://thissite.com/lib/jquery.js, missing the registry cache match.

Thus we have to store in the registry as http://thissite.com/lib/jquery.js making the module name an absolute URI anyway.

So my point is why do we need to resolve relative module names against the address and not just the module name?

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

System.import('jquery.js');

This will not resolve to http://thissite.com/jquery.js. Out of the box this is going to be an error. You need to tell the loader what URL this maps to. Either by overriding hooks or some kind of mapping table.

@guybedford
Copy link
Contributor

@arv I've just updated the example to meet current semantics, the conflict still holds.

@zenparsing
Copy link

@arv So, if I understand correctly, module specifiers which don't match the rules you've laid out, such as:

import { openShell } from "zen-sh";

are treated as belonging to some application-defined namespace, essentially marking them as userland packages. For the browser, the application is responsible for defining how those names get resolved to URLs. Is that right?

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

@guybedford I see what you mean now.

The problem comes from when there is a mapping. Lets say you have 'path/to/a.js' which has a './b.js' in it. Now if someone maps 'd/e/f' to 'path/to/a.js'. Now when you resolve './b.js' you need to do this based on the actual URL of 'a.js' or you would end up with 'd/e/f/b.js' which does not exist.

@zenparsing Yes. That is the intent.

@guybedford
Copy link
Contributor

@arv that issue is caused by applying the mapping in locate. If the path rule applied in normalize (which is a many to one function, designed to retain uniqueness of module names), it would work out ok, and the registry remains the primary resource ID system, which is really important for uniqueness, which is crucial to not end up with duplicates.

@guybedford
Copy link
Contributor

I think this is close, but solving the problem in a way currently that is introducing more complexity and moving away from the uniqueness and consistency properties of module names, which are important.

@zenparsing
Copy link

@guybedford @arv Given these new semantics, is the locate hook still necessary or useful?

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

URLs are unique too. I feel like I am missing something here... Let me think a bit more about it. If I'm not mistaken the issue we showed was related to polyfilling a builtin module that would reside at "js/reflect" for example and this would load "./util.js" but we definitely do not want this to be loaded from "js/reflect/util.js" but from a file relative to the file that implemented the polyfill.

@arv
Copy link
Collaborator Author

arv commented Aug 1, 2014

@zenparsing Not sure it is needed but it might make custom loaders simpler to keep it. Needs more thinking.

@johnjbarton
Copy link
Contributor

It also means that the URL is relative to the file/URL of the current module and not relative to the module name.

I don't understand this sentence. To me it reads like "The URL is relative to the URL but not relative to the URL".

@arv
Copy link
Collaborator Author

arv commented Aug 15, 2014

Remember that names and URLs are not the same. Assume someone maps a name/a to http://foo.com/a.js and name/a loads ./b but there is a mapping for name/b to http://bar.com/b.js. With the new scheme we would instead load http://foo.com/b.js since we resolve things relative to the URL and not the unresolved name. (ignoring the .js suffix changes)

@johnjbarton
Copy link
Contributor

Sorry that makes no sense to me.

there is a mapping for name/b to http://bar.com/b.js. With the new scheme we would instead load http://foo.com/b.js...

How is that just not a bug?

@arv
Copy link
Collaborator Author

arv commented Aug 15, 2014

When you write relative paths in your import you expect these to be relative to the file you are writing and not something else.

@johnjbarton
Copy link
Contributor

When you write a mapping for name/b to http://bar.com/b.js you expect it to be obeyed.

I'm fine if the mapping is an error. Is there even a use-case for 'name/b' matching?

@zenparsing
Copy link

Sorry for stepping in, but here's my understanding:

We have two global namespaces in this design:

  1. The "abstract module name" namespace.
  2. The URL namespace.

The question is: when someone uses a relative path, which namespace is it relative to? @arv is suggesting (2), and that makes sense to me.

We get into dangerous territory when users sometimes expect (1) and sometimes expect (2).

I currently don't understand the motivation for maintaining namespace (1) above, independently from (2). I don't believe that motivation has been documented anywhere. Is it?

@johnjbarton
Copy link
Contributor

There are at least three namespaces:

  1. Module specifiers (the string in the declaration),
  2. Normalized name, the result of normalize()
  3. Addresses, the result of locate().

In each case the namespace allows control or convenience. Control because you can remap the names as you move between namespaces; convenience because you can use the remapping to allow short names in the declaration and still end up with long, unique addresses.

I interpret the current change as meaning that the referrer in the normalized name will be from the address namespace, a URL. That makes relative name relative to the address the module was loaded from rather than the root of the dependency tree that started this load.

But also makes the map nonsensical. If you normalize wrt to URLs, then the map has to be in the URL space as well. Then name/b to http://bar.com/b.js is illegal and we don't have a problem.

@arv
Copy link
Collaborator Author

arv commented Aug 15, 2014

I interpret the current change as meaning that the referrer in the normalized name will be from the address namespace, a URL. That makes relative name relative to the address the module was loaded from rather than the root of the dependency tree that started this load.

Correct. ModuleSpecifiers starting with ./, ../, / or scheme: are all relative to the referrer which should now be a URL.

If it doesn't start with any of those the name needs to have been registered already.

But also makes the map nonsensical. If you normalize wrt to URLs, then the map has to be in the URL space as well. Then name/b to http://bar.com/b.js is illegal and we don't have a problem.

Yes, the map needs to use URLs as keys (even though that is internal and I don't think it we need to spec that).

@Hixie
Copy link

Hixie commented Aug 15, 2014

Surely by default we want

import "foo";

...to do something useful (such as importing foo.js). It would be really sad if we made everyone have to import a basic loader just to do basic stuff like that.

@arv
Copy link
Collaborator Author

arv commented Aug 15, 2014

You can do:

import "./foo.js";

instead

@zenparsing
Copy link

@johnjbarton Maybe I'm missing something, though - why is "a/b/c" not a valid URL?

@johnjbarton
Copy link
Contributor

http://www.w3.org/Addressing/URL/url-spec.txt

A complete URL consists of a naming scheme specifier followed by a string whose format is a function of the naming scheme.

So "a/b/c" is not a URL. But I think you mean the naming ought to follow the hrefs in browsers.

Currently module specifiers are not hrefs. We could decide to make them hrefs and we could decide that package naming requires using the whole URL apparatus. But the current sentiment is that these names are not URLs because they refer to files that might be interpreted as URLs in a browser or as file system paths in other platforms. Certainly it is the case that no existing JS package system is based on hrefs, they all use a system like the one we have now.

@Hixie
Copy link

Hixie commented Aug 15, 2014

"./" doesn't mean anything magical in most file systems. Certainly URLs and Unix systems don't treat it as anything special. Some command line mechanisms need "./" to execute a local file (as opposed to just referencing it), but that's a security feature (without it they only complete the filename against the $PATH, not the current directory).

@zenparsing I don't really see why "a/b/c" can't mean one thing by default (namely, get "a/b/c.js" or some such), and have it be overloaded by people who want a packaging system (where presumably it just means "expand a/ to ../foo/bar/" or some such). This would allow you to cleanly migrate from one to the other by just defining a package for "a/" when you move the files from "a/b/" to "foo/bar/b", for instance. (Mind you, even with packaging mechanisms, I'm still pretty sure one would want the "normalize" hook to resolve the names to absolute URLs for the registry.)

The current solution works great: Relative names start with a ., absolute names do not. Making some names that start without a dot relative would confuse without even waiting for it to happen.

I'm not sure what "absolute" means in this context.

@zenparsing
Copy link

@johnjbarton I'm not really advocating anything in particular - but it is an interesting design issue with tradeoffs that should be well-documented.

@zenparsing
Copy link

And of course I wasn't trying to say that "a/b/c" was a syntactically valid complete URL. That would be insane : )

@Hixie
Copy link

Hixie commented Aug 15, 2014

Note that http://www.w3.org/Addressing/URL/url-spec.txt is woefully obsolete (1994!). The modern definition of URLs is http://url.spec.whatwg.org/ and it clearly says "A URL must be written as either a relative URL or an absolute URL". "a/b" and "../a/b" are both (relative) URLs; there's no syntactical difference between them from a URL perspective. That's why I think it's a bit dodgy to treat them as different things in imports.

Personally by default I would treat both as just being relative to the referrer address' path segment, then stick ".js" on the end and resolve that relative to the base URL. I think it would be logical to make an exception for things which start with ASCII alpha, then have zero or more ASCII alphanumeric, "+", "-", or ".", and then have a colon (treat those as absolute URLs) and for things that start with two slashes "//" (treat those as scheme-relative URLs). (Also, in HTML I'd probably add an exception for things that start with "#", so you can refer to elements by ID.)

@johnjbarton
Copy link
Contributor

This is all interesting but this is just a bug tracker on traceur, so es-discuss would make more sense for such issues.

@guybedford
Copy link
Contributor

I've put together an implementation of something along these lines at the PR ModuleLoader/es-module-loader#219.

arv added a commit to arv/traceur-compiler that referenced this issue Nov 4, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 5, 2014
arv added a commit that referenced this issue Nov 5, 2014
arv added a commit to arv/traceur-compiler that referenced this issue Nov 11, 2014
@justinbmeyer
Copy link

I'm trying to wrap my head around this. Please correct me if I'm wrong.

As @johnjbarton mentioned, there are 3 namespaces:

  1. Module specifiers (the string in the declaration),
  2. Normalized name, the result of normalize()
  3. Addresses, the result of locate().

This change makes the default normalize hook use the "parentAddress" with module specifiers that look like "../" or "./". This will result in normalized names that are urls/paths.

Example

Compare what happens previously to currently with the following example:

System.paths = {
  "a/b": "path/B/b.js",
  "a/c": "path/C/c.js"
}
// in main.js
import b from "a/b"
// in a/b.js
import c from "../c"

Previously:

specifier -> normalized -> address
"a/b" -> "a/b" -> "path/B/b.js"
"../c" -> "a/c" -> "path/C/c.js"

Current

specifier -> normalized -> address
"a/b" -> "a/b" -> "path/B/b.js"
"../c" -> "path/B/c" -> "path/B/c"

This seems strange.

It makes moduleNames unpredictable to library authors. If I'm in module "foo/bar/zed" and write import "../baz", that module should be named "foo/bar/baz". This enables me to setup a mapping for "foo/*" moduleNames without worrying that the module name might look like "http://cdn.com/foo/bar/baz".

Module identifiers should be operating against module names, not addresses. This enables the two level mappings that are so powerful and well understood in things like RequireJS and currently in SystemJS. By two level mappings, I mean you can map module names and paths:

map: {
  "foo/*": { "jquery": "[email protected]" }
},
paths: {
  "foo/*" : "cdn.com/foo/*"
}

@justinbmeyer
Copy link

@johnjbarton @arv @guybedford After thinking about it a lot, I think this is a step in the wrong direction. I'm very sorry for coming so late in the game. Hopefully, I am missing something.

I've written up why I think address normalization is an incorrect solution here: https://gist.github.com/justinbmeyer/c36689eab045b0a71542

so es-discuss would make more sense for such issues

It's my understanding that es-discuss is no longer the place for the Module loader spec. Where is the appropriate place to post this and have it debated?

@johnjbarton
Copy link
Contributor

Where is the appropriate place to post this and have it debated?

Honestly I have no idea.

@arv
Copy link
Collaborator Author

arv commented Jan 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants