-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review slashes after file:/// and file path normalization #405
Comments
As I wrote in #232 maybe there are other solutions, Some ideas:
No 4. seems most strict/ traditional but does not unify |
FWIW, I'm open to revisiting this as it's clearly not implemented as-is (apart from perhaps in Node.js). It would be good to test IE6 and especially browsers on Windows in general. And we need to ensure that Is this something you're still interested in and willing to help out with perhaps? |
I don't know if I can help, but I'm willing to give it a try. |
I'm still trying to get results on IE6/7.
I tested a bunch more examples but I need some more time to make sense of them. |
I ran another batch of examples. An attempt to summarize:
Given
Tested on Edge 17 and 18 and Chrome 71 on Windows (via browserstack), Safari 13, Chrome 81 and Firefox 76 on Mac. |
I'll add another case (which came up in https://bugs.chromium.org/p/chromium/issues/detail?id=1128999): |
@alwinb (or anyone else): would you be able to propose a spec patch, or just a general algorithm for this? Then we can prototype your algorithm in jsdom/whatwg-url, run it against all the test cases you have compiled, and see how well it works and how well it matches browsers. If it's idempotent and closer to browser behavior than the current spec, then we can update the spec. I'm happy to help with the prototyping/testing/spec patch process, but I would need an algorithm to start from. |
Reading through the above in a bit more detail, it looks like you were asking for feedback on what approach would be most reasonable. I'll check with coworkers to see if they have any stronger opinions, but for my part, my requirements are very minimal. This is a crusty legacy area and just getting to interop is the important thing. So:
Trading (3) off against the nice to haves is also reasonable. E.g. if there's a much simpler version of the algorithm that only matches 1/3 modern browser engines, then we should probably go with that. (But if it matches 0 engines, then it's not good.) |
Platform independence is difficule since there are many file system types (case sensitive, case insensitive, and many more with unicode and different kinds of unicode normalization) and different behavior for one platform. The platform-independence of box, dropbox, onedrive, icloud, google drive when mounted might be a more achievable goal. |
The URL parser, i.e. the algorithm we're discussing here, doesn't need to worry about those things. There may be further troubles translating a parsed URL into a file on the filesystem, but that's out of scope for this repository, and more of the providence of e.g. HTTP server software. |
This turns out to be a very complicated problem. The only viable approach I've been able to come up with is to not ignore multiple slashes in file URLs. At least this agrees with Safari and Firefox and the RFCs, and it prevents messy interactions with #302 that break idempotency. (It even agrees with IE/Edge in some cases). Then for #302 I would propose to preserve the host in file URLs (also in the presence of drive letters) except for localhost, which would be replaced with the empty string. That then disagrees with Firefox but is more in line with the others (considering just the host). Together these changes will cause quite a bit of disagreement though. But I don't see any better alternative. Is this clear enough? I can have a look at the state machine to see where to make the changes, but I'd like to be sure that there's some consensus about the approach first. Somewhat off-topic, and I'm hesitant to mention this, but I decided to go ahead because it could be useful for the standardisation process. I have tried to build an URL library that is compatible with the standard, but has separate parsing, resolving and normalisation phases. It uses a simple 'theory' of URLs, modelling them as a sequence of tokens. They're similar, but different from the parse trees as per the RFCs. I wrote it down in the Theory section of the README. It can be useful to describe the different browser behaviours as operations on such token lists. And in general it could help with points 4 and 5 that @domenic mentions above, amongst other things. – Just in case. |
This makes sense to me, and I'm happy to say that Chrome is in favor of it at a high level. (In practice, Chrome is very slow to make changes to its URL parser, so this support doesn't necessarily translate into action. But it should be enough to help proceed with consensus-gathering.) It sounds like your proposal mostly matches Safari, so probably if we can get @annevk's high-level sign-off for Mozilla, then we're ready to move on to the next step. We can perform a second round of checks with all the browser representatives once we have a spec patch/test results patch/complete listing of mismatches with current browsers. |
If it works for Chrome and Safari, it'll likely work for Firefox. @jasnell for thoughts from Node.js. cc @valenting |
I'm all for it. Changing the Firefox implementation should be relatively straight-forward. |
alwinb's proposal sounds good to me. It sounds very close to if not exactly what I implemented in WebKit. |
Excellent! I will start with updating the tests (urltestdata.json), then try and make the changes to whatwg-url, and post back here. I may need some time to learn the process (forks and merge requests and the like), I have not done that before, but I suppose it'll become clear to me as I go along. |
I have updated the tests and made the few changes needed to to jsdom/whatwg-url. I'm not sure how to proceed now. I assume these have to be reviewed before I open any merge request? It seems inappropriate to do so before the standard has been updated. I can have a look at making those changes to the standard as well (i.e. this repo). Thanks, |
@alwinb nice! You should feel free to open PRs and we can label them "pending specification change" (paraphrased). It's not a problem. If you can make the change to the specification too that'd be great. Note that you'll need to sign https://participate.whatwg.org/agreement as well. Hope that helps. |
Yes, this is amazing work! Please do open pull requests; it's only merging of them that we'll block until the specification is updated. We'll also get an extra benefit, where if you open the web platform tests pull request, a bot will run the tests on all browsers so we can see the results. The changes to whatwg-url are promising, in that they make the algorithm simpler. Hooray! |
Looking at https://wpt.fyi/results/url/url-constructor.html?label=pr_head&max-count=1&pr=25716 (which is generated from web-platform-tests/wpt#25716 (comment)) most things look good, but there is a section of tests which fail in more browsers more than would be ideal:
It looks like most of the cases with 2 failures were also 2 failures before (referencing https://wpt.fyi/results/url/url-constructor.html?label=master&label=experimental&product=chrome&product=firefox&product=safari&aligned), albeit sometimes a different 2. So that's not a regression. Indeed, in cases where all 3 browsers give different results, that's the best we can hope for. But the conversion of some of them from 2-failures to 3-failures seems suboptimal. It seems to be concentrated around |
Hmm. This is actually about a separate issue – the handling of Apparently, Chrome and Firefox do not handle the What is interesting is that IE/Edge prior to version 18 also did not interpret I just checked. I have no data on e.g.
This has changed since version 18, which does interpret As an aside, Safari fails the mentioned tests because it (sometimes? or always?) does not copy the host from the base URL when parsing host-less URLs. I suspect that this is not intentional; it seems inconsistent with the way in which it parses absolute file URLs. It does however handle the |
Wait, no there's more going on. I think that Chrome on windows will pass those drive letter tests. If I remember correctly, Chrome disables drive-letter handling on non-windows platforms. |
Yeah, wpt.fyi does not run Chrome on Windows and what Chrome on Windows does is somewhat important for |
Ah, OK. Yeah, for drive letter normalization I'd rather align with Windows browsers. Since we have two to choose from these days, aligning with Chrome Windows seems reasonable to me. (Firefox seems to not treat pipe characters specially.) OK, then yeah, let's proceed with the spec update! Let us know if we can help with that; it should be mostly mechanical given the changes you made to whatwg-url. |
Ok, I made the changes, I hope it all went ok! |
The current standard asks to remove leading empty directories from file paths.
This seems to have been motivated by the desire to ignore slashes after
file:///
whilst avoiding reparsing issues. However,Examples, tested in Chrome canary, Safari preview and Firefox nightly:
/..//localhost//pig
againstfile://lion/
file://lion/localhost//pig
file://lion//localhost//pig
file:////localhost//pig
file://lion//localhost//pig
file://localhost//a//../..//
file:///
file:///.//
file://///
file://localhost///
file://localhost//a//../..//foo
file:///foo
file:///.//foo
file://///foo
file://localhost///foo
file://localhost////foo
file:///foo
file://foo/
file://////foo
file://localhost////foo
file:////foo
file:///foo
file://foo/
file:////foo
file:///foo
Notes:
(Dropping the host in the first seems off).
file:///
and keepslocalhost
, avoiding a reparsing issue.The text was updated successfully, but these errors were encountered: