Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A '@' character in the host part of file URLs #805

Open
hayatoito opened this issue Dec 5, 2023 · 19 comments
Open

A '@' character in the host part of file URLs #805

hayatoito opened this issue Dec 5, 2023 · 19 comments
Labels
topic: file Aren't file: URLs the best? topic: parser

Comments

@hayatoito
Copy link
Member

hayatoito commented Dec 5, 2023

(Reported in https://crbug.com/1502849)

It appears that Windows uses file URLs with '@' (U+0040) characters in their host parts, such as file://webdavserver.net@ssl/a.pdf.

However, according to my understanding, file://webdavserver.net@ssl/a.pdf is an invalid URL in the URL Standard because '@' is considered a forbidden host code point.

To ensure compatibility with Windows file URLs, should we consider allowing the '@' character in the host part of file URLs?

I'd appreciate hearing opinions of the URL Standard folks on this matter.

@annevk annevk added topic: parser topic: file Aren't file: URLs the best? labels Dec 5, 2023
@annevk
Copy link
Member

annevk commented Dec 5, 2023

It seems reasonable to allow, but I wonder if it would be possible for Chromium to determine the complete set of changes needed for it to not have platform-divergent behavior. At least I suspect that making them all at once would allow for an easier rollout.

@karwa
Copy link
Contributor

karwa commented Dec 5, 2023

A single @ in the authority section (username, password, hostname, port) generally delimits the credentials from the hostname.

Let's take any other URL scheme, e.g. HTTP: http://webdavserver.net@ssl/

  • "webdavserver.net" would be the username
  • "ssl" would be the hostname

Which is clearly not what the reporter wants to happen.

What's more, this has been the accepted interpretation for at least the last 30 years (going back to RFC-1738). I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign, so the output of the URL parser must keep the @ escaped in order to properly encode its understanding of the URL components. file://webdavserver.net%40ssl/ is semantically correct.

I think the actual problem is that hostnames in file URLs are not able to contain percent-encoding. I looked in to this in depth a while back, and found that:

  • UNC server names can contain spaces, which obviously must be escaped. Chromium even has test cases for this (see linked issue below), so you'll probably hit this sooner or later.
  • Windows allows pre-canonicalised paths, which are expressed using UNC syntax with the hostname "?" (e.g. \\?\C\SomePath). These cannot be expressed using file URLs because the hostname would have to be %3F.

See #599

@whatwg whatwg deleted a comment from Ur100 Feb 5, 2024
@catmanjan
Copy link

catmanjan commented Sep 30, 2024

Please remove @ from the forbidden host code point list!

Given rfc1738 it was probably a mistake that it was there in the first place.

Just on this:

I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign

Firefox, Safari, windows explorer, linux terminals all handle the URL fine, in fact its only chromium based browsers that have the issue because they want to use the URL standard as their only authority, rather than use multiple path standards...

@valenting
Copy link
Collaborator

I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign

Firefox, Safari, windows explorer, linux terminals all handle the URL fine

Safari also rejects this URL, and the only reason it works in Firefox is that we currently ignore everything in the hostname part of a file URL (tracked in 1507354 - URL parser discards host for file URLs

Allowing @ only in the authority section of file URLs seems like a weird exception to make.
I'm in favor of keeping hostname parsing as close to the HTTP url parser as possible - and here the @ sign should probably be percent encoded.

@catmanjan
Copy link

@valenting yes I think the problem is calling them file URLs, they are URL like but ultimately the OP (file://webdavserver.net@ssl/a.pdf) is a UNC file path, and currently its just a coincidence that Chromium works for most of them...

@annevk
Copy link
Member

annevk commented Dec 2, 2024

@hayatoito any thoughts on @karwa's comment? It seems very plausible that the solution here is solving #599.

@hayatoito
Copy link
Member Author

It appears that we can close this issue, in favor of #599. #599 seems more general. Thanks!

@hayatoito
Copy link
Member Author

Ah, my previous comment was premature. I'll retract it.

Do you mean the solution provided at #805 (comment) (allowing percent encoding in file: URL host) is sufficient, and we don't need an opaque host in file: URLs? (#599)

I don't have a strong opinion, but to my knowledge, unescaped spaces are still used in file: URLs on Windows.
See https://b.corp.google.com/issues/40256677#comment20 for previous measurements.

So I'm wondering how we should handle unescaped spaces.

@hayatoito hayatoito reopened this Dec 2, 2024
@annevk
Copy link
Member

annevk commented Dec 2, 2024

Oh I see, that is indeed confusing.

I was thinking that we probably want to switch file: URLs to use opaque hosts instead of domains (for which #599 is the issue) as that would also allow percent-encoding of @, for instance and address other issues outlined in #599.

Now for non-percent-encoded spaces do they need to remain non-percent-encoded or would it be okay when we create the buffer which we pass to the host parser to percent-encode at that point? That way opaque hosts can still ban spaces, but they end up working for file: URLs due to preprocessing in the "file host state".

@hayatoito
Copy link
Member Author

Thanks for the explanation. Let me confirm the intended visible behaviors of a proposal here.

The question is A and (B or C), right?
Regarding B or C, B seems more intuitive, aligning with the general understanding of "opaque."

A. Don't percent-decode any characters in an opaque host.

const url = new URL("file://opaque%2ahost/");
console.log(url.hostname); // "opaque%2ahost"
console.log(url.href); // "file://opaque%2ahost/"

B. Don't percent-encode any characters in an opaque host.

const url = new URL("file://opaque host/");
console.log(url.hostname); // "opaque host"
console.log(url.href); // "file://opaque host/"

C. Percent-encode spaces (or other chars?) in an opaque host.

const url = new URL("file://opaque host/");
console.log(url.hostname); // "opaque%20host"
console.log(url.href); // "file://opaque%20host/"

That way opaque hosts can still ban spaces,

I'm afraid I might misunderstand what ban means here. Is my understanding, C, correct?

@catmanjan
Copy link

B is the only option that will actually end up solving the original WebDAV issue

@annevk
Copy link
Member

annevk commented Dec 3, 2024

Right, opaque hosts don't do percent-decoding and don't have IDNA either. So A is what I would expect if we make that change.

And if we don't want them to contain literal spaces we'd have to percent-encode those as in C. (B doesn't seem great, but in theory we could have a file opaque host or expand the value space of opaque hosts I suppose.)

They could still be percent-decoded down the line of course depending on the protocol in use.

@catmanjan Why can WebDAV deal with %40 but not %20?

@catmanjan
Copy link

@annevk I don't think WebDAV can deal with either, when I say WebDAV I mean Microsoft's WebDAV mini-redirector software. In the original post the file URL is resolved on the client via the WebDAV mini-redirector, it requires the unencoded @ symbol to determine whether to use HTTPS.

@karwa
Copy link
Contributor

karwa commented Dec 3, 2024

Regarding B or C, B seems more intuitive, aligning with the general understanding of "opaque."

Percent-encoding is used to escape opaque strings, allowing them to contain any character (even those which conflict with URL syntax characters - those characters get escaped). In B, even if we can allow some dodgy things like unescaped spaces, we still can't allow every character, so actually C aligns more with "opaque".

All software needs to unescape the URL component to read the actual content stored inside (percent encoding has no meaning at the application level). If the Microsoft software is not doing that, I'm inclined to say it's an application bug, albeit one that the standard has no solution for.

@catmanjan
Copy link

Here is Microsoft's view on it: https://learn.microsoft.com/en-us/troubleshoot/windows-client/networking/url-encoding-unc-paths-not-url-decoded

With options C, implementers (Chromium people) will just have to percent-decode the host themselves before they start trying to use it as a UNC

@annevk
Copy link
Member

annevk commented Dec 3, 2024

@catmanjan that document is about the path, not the host.

@catmanjan
Copy link

@annevk ok

@hayatoito
Copy link
Member Author

@karwa
Thanks for the explanation! That clears things up.

Could I confirm one more thing to ensure I understand completely?

Non-special opaque path URLs don't percent-encode spaces:

const url = new URL("git:opaque path");
console.log(url.href); // "git:opaque path"

I assume the opaque host and opaque path should be handled differently, in terms of percent encoding, correct? We percent-encode spaces in file: URLs' hosts, but not in non-special opaque paths.

@annevk
Copy link
Member

annevk commented Dec 4, 2024

Yeah, opaque paths (which you can only get for non-special URLs so non-special is kinda redundant, FWIW) are their own thing. Not percent-encoding those spaces there has been problem (#784), but it's probably not a compatible change to start encoding them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: file Aren't file: URLs the best? topic: parser
Development

No branches or pull requests

5 participants