Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rephrase the stated goal of obsoleting RFC 3986 and RFC 3987 #703

Closed
alwinb opened this issue Aug 31, 2022 · 18 comments
Closed

Rephrase the stated goal of obsoleting RFC 3986 and RFC 3987 #703

alwinb opened this issue Aug 31, 2022 · 18 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@alwinb
Copy link
Contributor

alwinb commented Aug 31, 2022

The topmost goal of the WHATWG standard states:

  • Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process.

I believe that it will be good for this standard to first discuss, and then rephrase this goal.

The goal as stated, is confusing; It is easy to read this to mean that the RFCs are no longer relevant and that there is a consensus across committees that the WHATWG standard is the one common URL standard. The WHATWG standard does not cover the same things as the RFCs do and vice versa, and the IETF has not endorsed the WHATWG standard. As a consequence of the above, It is likely to confuse or upset people.

I would love to see this goal being rephrased in a way that is more honest, more accurate, and less likely to be experienced as offensive. Adding a bit of discussion around this can help a lot to dissolve misunderstandings, annoyances and even hostile responses.

State the facts and inform the readers about the situation so that they can make their own decisions. Be honest about what the WHATWG Standard does and does not provide. Be clear that the goal has not been met yet and work towards it, or amend it and refer people to the RFCs for those things that the WHATWG cannot provide.

@alwinb
Copy link
Contributor Author

alwinb commented Aug 31, 2022

Motivated in part by the discussion in #479

@annevk
Copy link
Member

annevk commented Aug 31, 2022

You'll have to elaborate on what is confusing here. The idea is indeed that those particular RFCs are no longer relevant.

@alwinb
Copy link
Contributor Author

alwinb commented Sep 11, 2022

You'll have to elaborate on what is confusing here.

It is confusing and likely to upset people.

From what I have read, the WHATWG standard was created because the IETF RFCs did not specify error recovery and did not match browser behaviour. It was created in response to the fact that changing browsers to match the RFCs wasn't feasible, and in addition to that, a lack of progress and the fact that nobody else with the IETF took on this task.

Anne, since you wrote the standard, you can amend, clarify or confirm.

It is clear that this split didn't leave the WHATWG on good terms with the IETF community. Given that situation, it is extra important to be careful about the way in which you state your goals.

  • To state that your goal is to make someone else's work irrelevant, is often experienced as hurtful. If this really is your goal, then you can state it in a more careful and nuanced way and write additional motivation.

  • It is not clear from this statement that there are two standards bodies that specify URLs and that they are in conflict. People may take it at face value and fail to assess the situation for themselves. This is both confusing, and upsetting.

  • It is not stated why, how, and where the WHATWG Standard diverges from the RFCs. That is confusing.

  • It fails to discuss the many similarities. It does not acknowledge the contribution of the people that wrote the RFCs. Given the situation, this it is likely to upset people. Give credit where credit is due.

To be clear, the WHATWG Standard does provide many things that the RFCs did not. It does provide an algorithm. It provides an API specification, there is a reference implementation and a test suite. The WHATWG does make progress on describing the 'real world' situation and on getting web browsers aligned. Why don't you state that?

There are also a lot of things that the RFCs provide that the WHATWG standard does not provide:

  • It does not support parsing relative references stand-alone.

  • It does not separate syntax and semantics; specifically, it does not contain a concise description of a parser that produces an AST (or CST) and it does not provide a separate, concise resolution algorihtm.

  • It does not discuss the multiple normalisation functions that the RFC describes and it does not specify their corresponding normal forms. Note that these different forms have different applications, and they are used in non-browser applications.

  • It does not discuss the comparison ladder and the equivalence classes that corresond to these different normalisations.

  • It fails to ensure that normalisation is congruent with resolution. This is as important as preventing reparse bugs.

  • It does not specify a reserved and/ or unreserved set of code-points. This is important for applications that require additional processing of the opaque-path, for one example.

  • It does not contain the additional informative text that describes the relevance and the role of each of the components in a human readable form like the RFC does.

  • The RFCs provide an infrastructure for other specifications to build on. It is very difficult for other specification authors to use the WHATWG Standard as a 'library' in their own specifications, such as, for example, URL schemes that put additional constraints on the components of URLs, or schemes that parse additional semantic structures from the opaque path as mentioned above. I think this is what @bagder was getting at in this comment.

That's elaborate enough, I hope!

The pont is, you don't obsolete the RFCs, you just don't and you don't seem to understand that. People depend on the RFCs and you claim you're obsoleting them, but you don't provide in their needs. That is what is causing upset and hostility. This is no different from the situation a decade ago, where the IETF standards did not provide in the needs of browsers, causing upset an hostility towards them.

To make it worse, applications that depend on those specific RFC features, are now stuck in a situation where they cannot be web-compatible. This hurts the adoption rate of this standard, and causes more frustrations. It's just not a healthy situation.

This is why I opened this issue. As long as these issues are not solved, at least be very open about them and discuss them, I believe this would be good for the larger community.

cc @masinter, @mnot

Note that I am not just complaining. I have taken action when you did not respond to these concerns. I wrote this URL Specification and a reference implementation that passes the tests and adds some of the missing features. It matches the behaviour specified in the WHATWG standard. I'm showing you a path forward.

@masinter
Copy link

I think the characterization of the root problem as one of "respect" and "giving credit" is dead wrong.
It's a disagreement about an engineering question about the applicability of the "Postel Principle" to one particular protocol element which has many roles outside of browsers and HTML.

URLs may seem to be just another part of the browser experience and for that application, perhaps it is useful to specify additional rules and behavior. But being generous to writers who type in URLs by hand into HTML and the Address bar isn't worth the pain of making every other non-browser application suffer.

People are used to URLs breaking, not just new 404 not founds (and the occasional 418) but we've managed to move large swaths of the net from http: to https:.

IETF is not a membership organization -- "the only ones there are the people who come".
Why isn't there a WHATWG HTTP Living Standard? Perhaps that might give a clue.

@karwa
Copy link
Contributor

karwa commented Sep 11, 2022

But being generous to writers who type in URLs by hand into HTML and the Address bar isn't worth the pain of making every other non-browser application suffer.

Then again, it appears that many non-browser applications deviate from the RFCs specifically to adopt lenient parsing behaviour from the web, as it can emerge in unexpected places. HTTP redirects, for instance, seem to come up often as a place where people see dodgy URLs/relative references.

So I'm not really sure that it is fair to characterise web-compatibility as "pain" and "suffering"; non-browser applications seem to be willingly adopting these behaviours, at the request of their users, even before there was a formal standard telling them what web-compatibility even meant.

For most modern browsers, input in to the address bar goes through a totally different "fixup" process which considers things like the user's history and bookmarks, automatically adds schemes, adds TLDs to domains (e.g. .com), and lots of other stuff. It has nothing to do with anything in this standard -- and the standard in fact calls it out as being out-of-scope:

How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard.

@alwinb
Copy link
Contributor Author

alwinb commented Sep 12, 2022

I think the characterization of the root problem as one of "respect" and "giving credit" is dead wrong.

This is not at al what I tried to communicate.

I know you disagree about more tolerant parsing. But the way you've responded to that challenge is not functional. The current situation with two distinct standards that don't match each other, and don't specify how why and where they are different, is far worse than the thing you've been trying to avoid.

If you can't solve that disagreement, then at least describe the situation to you readers, describe the conflict, describe the differences, identify the overlap, learn from each other, and do align with each other on the issues where you don't disagree.

@annevk
Copy link
Member

annevk commented Sep 21, 2022

I think those who contributed to the RFCs are mentioned in https://url.spec.whatwg.org/#acknowledgments. Not sure what makes you say they are not.

As for things the RFCs provide that this document does not, most of those are intentional design decisions. And we typically don't document those in the standards. If some concrete need for any of those can be shown that's open for reconsideration of course, but thus far that bar hasn't quite been met.

And to be clear, I somewhat regularly interact with the IETF community and many there can appreciate the situation for what it is. I would not say the WHATWG is on bad terms.

@alwinb
Copy link
Contributor Author

alwinb commented Sep 22, 2022

Not sure what makes you say they are not.

The standard text does not communicate that you acknowledge, as in, are aware of, the structural design of the RFCs and the motivations behind that design.

As for things the RFCs provide that this document does not, most of those are intentional design decisions.

It is strange to claim that you want to obsolete the RFCs and to then make a deliberate decision to not include important parts of what they cover.

The standard text does not mention this and does not provide a justification for that decision.

I don’t have anything else to say about this that I’ve not already said.

@masinter
Copy link

@annevk
Copy link
Member

annevk commented Sep 23, 2022

The standard text does not communicate that you acknowledge, as in, are aware of, the structural design of the RFCs and the motivations behind that design.

I don't think that belongs in a standard. I haven't seen this done in other standards efforts either.

It is strange to claim that you want to obsolete the RFCs and to then make a deliberate decision to not include important parts of what they cover.

I suspect we disagree on "important parts".


Perhaps there is another way to solve this. Do you have a suggested rephrasing of the current goal that would make this a bit more clear in your view?

@tmccombs
Copy link

tmccombs commented Feb 5, 2023

It seems like this specification is primarily concerned with how urls are used by browsers. Which makes sense for a specification from WHATWG. As a specific example, one reason stated for not supporting relative urls in #421 and #531 is that they aren't needed in the browser. However, browsers are not the only places that urls are used. Perhaps that goals should be scoped to "in the context of a web browser"? Although, that brings up the concern of divergence between how urls are handled in the browser and on servers.

@bagder
Copy link

bagder commented Feb 6, 2023

IMHO, the API should be dealt with in a separate repo as it is unrelated to the URL spec.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Mar 7, 2023
@annevk
Copy link
Member

annevk commented Sep 10, 2023

I was about to close this issue due to the lack of specific suggestions, but noticed that @xfq added the i18n-tracker label. Is this something the i18n WG wants to weigh in on @xfq?

@aphillips
Copy link

@annevk I think we are tracking it because it's I18N related and so that we'd see any activity pop up in our digest (which is how I noticed the comment). We'll ping you back today, since we're all here at TPAC 😉

@aphillips
Copy link

@annevk I18N is okay with you closing this: we have no specific issue here.

@annevk
Copy link
Member

annevk commented Sep 11, 2023

Thanks @aphillips!

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Sep 11, 2023
@tmccombs
Copy link

Here's a specific recommendation for the stated goals. Change the first bullet point to:

Align RFC 3986 and RFC 3987 with contemporary browser implementations and replace the RFC as the standard for URLS in the browser obsolete the RFCs in the process. (E.g., spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]

(Additions in italics, removals with strikethrough).

and maybe add a section that compares the differences with the RFCs and discusses when it is appropriate to use the WHATWG standard vs the RFCs.

Or alternatively, increase the scope of the project to include all non-browser use-cases, and addresses some of the deficiencies that @alwinb mentioned above.

@alwinb
Copy link
Contributor Author

alwinb commented Sep 11, 2023

I agree with the comment of @tmccombs above.

And definitely add a section describing the differences between the RFCs and both valid and parsable WHATWG URLs. This is a basic requirement for any good standards document.

I was about to close this issue due to the lack of specific suggestions

Closing this issue without acting on it, I consider proof of the fact that the WHATWG is more interested in advancing its own political position than in creating clear, high quality open standards that benefit the internet community at large.

It makes no sense for me to put more effort into this if there is a political motivation to frustrate my efforts.

My father used to say that politics was a necessary evil. But I believe that politics is the consequence of a lack of personal strength :)

I wish you good luck going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Development

No branches or pull requests

8 participants