-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow self-closing tags everywhere #9491
Comments
I have "a tiny Déjà vu" (this is discussed and desired since at least 2016 btw ... glad we keep desiring this in 2023 - and rightly so) |
I think we shouldn't introduce new parser flags that change parsing behavior. They cause XSS issues. (In #9426 we're investigating if we can remove the scripting enabled flag.) cc @whatwg/html-parser |
it's opt in, it won't cause issues to developers opting in + it's not about scripting neither, it's just a "don't ignore that As half a joke though: <!doctype x-html> would be lovely |
Sites that opt in will expose themselves to XSS issues if they also use a sanitizer that doesn't support this. Even if the sanitizer supports this, it can be confused by the different parsing in different documents, which again can cause XSS issues. Example: https://bugzilla.mozilla.org/show_bug.cgi?id=1615315 An exploit here could be something like:
|
it's like saying, if you switch to Python 3 but you use Python 2 tools to lint your code expect issues ... not sure I am following.
not sure I am following that neither ... you are creating invalid layout on purpose, see my previous Python v3 VS v2 analogy. P.S. in case my "half joke" hint wasn't clear, if there's any way to enable this, parser should throw if meant to be void elements are not self-closed as those are not welcome in the parser with that flag on so I don't see issues or any extra XSS that's not possible already in HTML5. |
The difference is that current sanitizer libraries will have no mechanisms to detect whether or not they're in "non-self closing mode" vs "self closing mode". Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities. There's no reasonable expectation that those libraries would support such a mode. A constraint for implementing this is to avoid such a scenario. Simon is saying that an opt-in does not avoid the scenario, therefore fails to meet the constraint. |
You can use XML. |
we have parseFromString where if you pass the wrong mime you're subject to the same issue you're mentioning ... right? if I use a library that doesn't support that syntax I should change library or contribute to make it compatible? ... as a new flag? ... like all parsers / transpilers / linters have ?
to see no image and have no HTML at all on the page, even with correct layout? can we keep the conversation focused, please? |
Worth adding a reminder here that the only effect the doctype has in browsers is to prevent browsers from using quirks mode to render the document: Without the doctype, browsers use quirks mode; with the doctype, they don’t. Ideally, we’d not want to have the doctype at all — because it has zero purpose other than preventing quirks mode — but it’s one of those legacy misfeatures that we’re now stuck with forever for backward-compat reasons. So, given that, using the doctype as a way to opt into causing any particular other behavior in browsers would likely cause a side effect of leading people to have the wrong mental model of what the doctype is — it could mislead people into thinking the doctype in HTML has some general meaning and purpose in browsers that it doesn’t actually have, and that the |
This comment was marked as resolved.
This comment was marked as resolved.
I’ll also add that, from previous discussions we’d had with implementors about other proposals that require changes to the parsing algorithm and to HTML parsers in browsers: Implementers are very unlikely to support/implement further changes to parsing behavior except for very compelling reasons. And I think we’d find that implementors won’t judge this to be a compelling reason to make further changes to the parsing behavior. |
which is a wonderful feature and it happens to play a wonderful role ...
I don't think it's that bad ... it's like a she-bang on top of executable and it serves a nice purpose ... the alternative is to go through a new mime-type, a new file extension that can't be The never ignored self-closing tags has been desired for already 7+ years and if some lovely legacy artifact could help everyone move forward faster, I'd say "why not" ... but then again, literally any way to have this landed would (personally) work to me. |
It wouldn't. |
I wonder if these concerns were raised when HTML5 saw the light ... but again, the argument about "user land libraries" being outdated has never been an issue for the entirety of the TC39 or CSS story so I wonder why this is being raised in here. |
It's being raised here because changes to the parsing algorithm can introduce XSS vulnerabilities. It is raised in any discussion about changing the parsing algorithm. It is something that each change to the parsing algorithm must navigate. |
OK, but the only example is a malformed layout with an old XSS thing from the 90s' ... does anyone else has a compelling XSS story / example to show and, if that's the case, what are the parsing libraries we should notify about this eventual change as "opt-in flag" to allow/consider? |
Notifying parsing libraries to update does not solve the issue. A library can be updated but all prior versions will be vulnerable. Those older versions and their installations do not disappear. Changes to the parser must not introduce security vulnerabilities in existing software. |
I agree with what you are saying but I am also hearing HTML as it is won't ever change from now on ... is that the future of the Web as seen by browser vendors? |
You can use HTML elements in XML (which you might also call XHTML). |
The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer (if it allows |
HTML changes quite a bit, but changes need to not introduce new security risks for users. Changes to the HTML parser are particularly security-sensitive. |
imagine it's 22 years I am doing this and XSLT is probably the next thing you'll tell me about ... still, I can't use just XML parser for HTML content, and I trust you know that too.
Not to my understanding. What I'm expecting is that once an explicit opt-in flag is used, a non closing
I need to scroll a lot to see any HTML change in there ... it's all about Babel involved folks or JS APIs so I am not sure what you mean there ... what I meant was in term of parsing abilities, as this thread underlines it's nobody intent to change that.
So, imagine your example either has issue already, so it's not a point, or it would throw with this flag on because the image tag is not self-closing, what are your real-world concern here? Do you have any example that is not already failing with current status-quo around this proposal? |
maybe this is worth clarifying for the sake of this discussion, and I don't know if @jakearchibald had a different idea, but a flag to enable exact same XHTML parser that would fail if void elements are not self closing is what I am after, without all the complications that XHTML needs (impossibly to remember doctype, special content-type, and so on). Every single browser is already capable of that, and if you read carefully the Jake's mentioned thread, everybody is using linters, tools, parsers, to allow and want self closing tags everywhere it's needed, which is not like 20 years ago when React and JSX didn't exist, everyone writes self closing tags even out of a mistake / rather habit, but that's normal. Accordingly, all arguments for something already available as a STRICT DTD XHTML parser for when such flag is used, would be what I am after ... if anyone wants instead a parsing for HTML that sometimes is OK with People use JSX these days, they write self-closing tags daily and they get the result they want ... that's (imho) what this change should be about, bring it back XHTML in a lightweight way that doesn't require server-side, mime types, and all that stuff to exist, as opt-in feature. Thank you, I think I've nothing else to add in here. P.S. this |
The easy solution for a sanitizer is to just resolve the auto-closed tags, so it changes <style/> to <style></style>, and then it can treat it like any other piece of HTML, without having to pay mind to context. |
I think there's literally zero interest from browsers to introduce a mode of HTML parsing that aborts on syntax errors. But @jakearchibald didn't ask for that, so it seems out of scope for this issue. But as for the
They are changes to the HTML standard. HTML is more than the parser.
The concern is explained in #9491 (comment) |
if parsers are already available I don't get the concern ... enable a lightweight, not throwing, XHTML parser (as it's already there and surely available in all tools?) and let opt-in people deal with gotcha, behind more robust tools that will ensure no gotcha happens? |
If you wish to use xhtml you can set the content-type of your server responses to In this issue I think Jake has made a clear enough proposal, and I'm worried we're derailing the conversation with talk about xhtml and other formats. The chief questions (IMO) that should be answered based on Jake's original proposal:
If you're unable to present XSS concerns, or backwards compatibility issues, then others may be able to. Allowing other's the space to formulate those in this issue thread would be the most productive step to resolving this issue. Minimising concerns from implementers will be counter productive and serves to make threads like this more difficult for other implementers to catch up on. I'm not trying to silence healthy discussion but let's keep focussed so we can resolve the explicit concerns around the OP. |
My XHTML point was an answer to outdated tools that, if legacy enough, won’t have issues with Jake’s proposal. But I’ll stay away as observer as it’s clear none of my point is being considered. Good luck Jake |
I'm not really bothered about the ability to self-close a div, but it would be very nice if self-closing a script tag worked, when 90% of the time it is going to link to external content.
|
@cunlic add every single custom element that doesn't need children in it to the equation, but it requires a long name to disambiaguate by standard specs (registry) 👍 |
Just to clarify, as I don't think it's made clear in the proposal yet, would it be:
I imagine there could be some desire for the first point to change too, which sounds unwise for compatibility or security; although, not changing it will probably support the continued favour of preceding The issues arising from the parsing of a self-closing element that causes the parsing of the rest of the document to differ depending on support sounds like a potential blocker unless all likely exploits in both directions can be avoided or mitigated (it wouldn't surprise me to learn that some of the native elements most desired to self-close are the ones that would have to still be excluded from doing so). Even if that is overcome, it seems willingness to change the parser has long been low, and the short-term incompatibilities between browsers, servers and tools deemed too high a burden, which has killed past related proposals. Unfortunately, this is a breaking change even with a switch, and wouldn't be backwards compatible. An old parser would not be able to parse a new document in a graceful manner, so most uses would need to support both versions and do content negotiation for many transitional years. A new parser would have to support both old documents without the switch, and new ones with it, indefinitely, which is above and beyond something like quirks mode. There's little to no appetite for that, especially after XHTML, so I expect this won't go anywhere again. Having said that, I'd love if this would be possible without all the issues. |
Agreed.
I think this could be a parse error if the
Agreed. |
It's pretty clear that we can't make the change proposed here in a way that would affect all existing HTML, which the OP even acknowledges by proposing an opt-in. As for making it opt-in as proposed, I think the lesson we should have learned from the implicit big switch between parsers for I think it would be incongruous to introduce new switch a time when we wish we could remove some of the existing mode axes of the HTML parser and are trying to make the successor for I think we should acknowledge that it's not great that the list of void elements needs to be hard-coded but that changing the language on that point would be worse than keeping having the characteristic of the language that predates the DOM, etc. Therefore, I think we should close this request as rejected. |
A terrible random idea during my PTO: can we do "strict self closing" with double slashes: |
The double slash would be weird... as you'd likely have to enforce the leading whitespace as well. Since attributes do not need to be quoted (if they don't contain spaces)... and multiple trailing slashes might be in a URL, you get weird stuff like this: All 3 URLs work, but if they 'self-closed' there would be no text to click on to initiate the links: |
@saschanaz no, that would have similar issues with web compat and XSS and also make the HTML syntax even more complex. Per @hsivonen's comment, Mozilla is opposed to the change proposed in OP and I see no evidence of interest from other browser vendors. Closing as wontfix. |
It’s understandable to mark this as wontfix, but it’s still a bit sad to collectively shrug at the fact that we gave up on having a simpler and intuitive element syntax (ie with self-closing tag support across the board like nearly everyone expects) essentially just to support unquoted attributes… (which in contrast look like they’re only allowed because of relaxed parsing rules) |
Wonder if it would be possible to allow self-closing at least for the custom elements (those with |
@RReverser that's a bit of a slippery slope because custom elements can't be known AOT so that any element with a |
I thought |
The |
That was proposed in #721 |
@zcorpan that never moved forward since 2020 though ... not sure it's going to change now as that requires a different parsing goal ad-hoc for CE only and I think that's even worse than asking parsers to not ignore self-closing in the wild 😥 |
People really seem to like self-closing tag syntax (see the replies to https://twitter.com/jaffathecake/status/1676843832284004353).
Maybe a switch should be added to allow them to be used on all elements?
Right now, documents can be a mix of rules where
/>
is largely meaningless, except in SVG and MathML. Making everything consistent seems… good?The text was updated successfully, but these errors were encountered: