Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

navigate() should not use parse a URL #9708

Closed
annevk opened this issue Sep 8, 2023 · 8 comments · Fixed by #9756
Closed

navigate() should not use parse a URL #9708

annevk opened this issue Sep 8, 2023 · 8 comments · Fixed by #9756

Comments

@annevk
Copy link
Member

annevk commented Sep 8, 2023

I think we should make new APIs use UTF-8 exclusively.

cc @zcorpan @domenic

(But maybe don't fix it right away as I'll refactor parse a URL first.)

@domenic
Copy link
Member

domenic commented Sep 8, 2023

I think I support this, but I would feel better if this was not literally the first such API. I think we have plenty of other precedents right? Like fetch() maybe?

I wonder if implementations are generally sloppy about this... might be worth writing tests for a lot of newer APIs.

@annevk
Copy link
Member Author

annevk commented Sep 8, 2023

Yeah, everything in Fetch should be good, but it's probably buggy. I suspect we can still get some of that changed. web-platform-tests/wpt#4934 (comment) has my progress on the general topic (with help from you, @zcorpan, and others for review and such).

@annevk
Copy link
Member Author

annevk commented Sep 9, 2023

Seems to be mostly good: web-platform-tests/wpt#41894.

@domenic
Copy link
Member

domenic commented Sep 10, 2023

Tests for some of the others in https://dontcallmedom.github.io/webdex/u.html#URL%20parser%40%40url%25%25dfn , e.g. new WebSocket(), would probably be worthwhile too.

I was thinking as part of this change we should add a note to "parse a URL" and "parse and serialize a URL". Something like:

These algorithms are to be used when deriving URLs from HTML content, or for legacy JavaScript APIs. HTTP-level URL processing, or URL processing done by newer JavaScript APIs, use the URL parser and URL serializer directly, so as to avoid changing results depending on the Document's encoding.

@annevk
Copy link
Member Author

annevk commented Sep 10, 2023

My thinking is that the next step is changing the name of these algorithms to "legacy" and add new algorithms that take a document/environment, but don't take the encoding into account. As it's much nicer to use an algorithm like this than grab the base URL from somewhere. (Exporting the base URL directly was probably a mistake?)

And I wonder if we can even refactor further to never have an encoding of an environment as workers are supposed to be always UTF-8.

@annevk
Copy link
Member Author

annevk commented Sep 10, 2023

We have coverage for WebSocket already, see https://wpt.fyi/results/html/infrastructure/urls/resolving-urls/query-encoding. Perhaps it should be moved out of that bigger test, but that's a separate task.

@domenic
Copy link
Member

domenic commented Sep 10, 2023

Sounds reasonable, although I don't think we should necessarily make new HTML attributes always use UTF-8, so "legacy" might be a bit strong.

@annevk
Copy link
Member Author

annevk commented Sep 10, 2023

Fair, maybe "parse a URL" and "HTML-parse a URL", although I also would not necessarily mind moving away from it more. Mostly you have to parse eagerly anyway and after that it no longer matters so it's quite easy to switch for new things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants
@domenic @annevk and others