Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Why is React hydration data within a script tag, commented out? #48

Open
justin808 opened this issue Feb 14, 2017 · 15 comments
Open

Why is React hydration data within a script tag, commented out? #48

justin808 opened this issue Feb 14, 2017 · 15 comments
Labels

Comments

@justin808
Copy link

I'm working on an issue to optimize open source library for server rendering: github.com/shakacode/react_on_rails/pull/660.

We're considering breaking away from placing react hydration data in a meta tag and moving into maybe a script tag.

Why does Airbnb puts react hydration data inside of an <script type=“application/json” data-key=“foobar”><!--{json}--></script> per the screen shots below? Is this related to XSS or performance reasons?

view-source:https://www.airbnb.com/s/Panorama--BC--Canada?guests=10&adults=10&children=0&infants=0&checkin=12%2F18%2F2017&checkout=12%2F25%2F2017&ss_id=19nvfbwh&ss_preload=true&source=bb&page=1&s_tag=pMJm3GqR&allow_override%5B%5D=

2017-02-13_15-04-16

2017-02-13_15-34-39

CC: @goatslacker

@ljharb
Copy link
Collaborator

ljharb commented Feb 14, 2017

Some older browsers will execute any script block, even if it's type is "application/json".

Best practice is to always comment it out to ensure nothing can ever execute.

@justin808
Copy link
Author

Hi @ljharb! Big thanks for the explanation! In terms of avoiding XSS, it seem that you escape & and >, and not <. Any other characters?

@ljharb
Copy link
Collaborator

ljharb commented Feb 14, 2017

The only thing that needs escaping is to prevent the literal string </script>, because if browsers see it in any context, they instantly close the script tag.

@justin808
Copy link
Author

justin808 commented Feb 14, 2017

@ljharb there might be the chance of some white space in the </script> and it being accepted, so probably you would not just strip that out. We're planning on using the Rails json_escape.

In terms of performance, did you ever compare placing the JSON data inside of meta tag? the react-rails gem does that. It results in lots more escaping when that's done that way.

ProductHunt.com uses an alternative approach where they use an inline script to set compressed, uuencoded, likely JSON data, directly to a global variable (see image below). Possibly the data is compressed to avoid XSS. If the page data were gzipped, double compression would do little. Still, one might need to strip out the '>' before placing directly in the script tag. The web client would read the global value and then decompress and parse the JSON to an object.

Did you ever try the performance of that technique? One downside I see is that until CSP 2.0 is widely supported (the nonce feature), a site that wants to use CSP would not be able to use this technique.

2017-02-11_13-30-52

@ljharb
Copy link
Collaborator

ljharb commented Feb 14, 2017

CSP is very important; inline JS would be a very bad practice. Performance is irrelevant until other concerns are addressed. A meta tag would have to go in the <head> which would delay the page from loading.

Is there something slow about a JSON script block that warrants investigation?

@justin808
Copy link
Author

Is there something slow about a JSON script block that warrants investigation?

Some users of https://github.com/shakacode/react_on_rails have complained that the use of a div's meta data element to hold the JSON data is too slow.

This is the source for https://nootrobox.com. There is too much quoting here.

Thanks again for the advice!

2017-02-13_21-42-59

@squadette
Copy link

@ljharb, thank you for explanation. I have one comment:

The only thing that needs escaping is to prevent the literal string </script>, because if browsers see it in any context, they instantly close the script tag.

In our testing, preliminary closing happens for everything which conforms to HTML end tag syntax (https://www.w3.org/TR/html5/syntax.html#end-tags). That means that the string can be case-insensitive and it can have spaces between script and <.

That is, <ScRiPt /> will also prematurely close.

I guess that you are actually handling that by escaping >, right?

@ljharb
Copy link
Collaborator

ljharb commented Feb 14, 2017

script isn't self-closing, but sure, < / sCrIpT> would probably be a concern too. Indeed handling either < or > covers any possibility of a closing tag.

@squadette
Copy link

I was sooo not awake when I wrote this. Of course it's </ScRiPt >. Sorry!

@justin808
Copy link
Author

@squadette brought up the point:

This is an explanation on why HTML comments are non-trivial if we do care about "older browsers": http://www.howtocreate.co.uk/SGMLComments.html#doubledash (via http://softwareengineering.stackexchange.com/questions/198481/why-cant-an-xml-comment-contain-two-hyphens)

Basically it means that we have to somehow escape dash-dash (even though we do escape lt/gt). That's why I'm really concerned about this idea.

@ljharb, any thoughts on if you'd need to escape double dash in the JSON?

@ljharb
Copy link
Collaborator

ljharb commented Feb 17, 2017

@justin808 that seems like a potential concern; a PR with test cases would be appreciated.

However, my reading of that is that -- merely changes how > is interpreted; so as long as > is always escaped, the issue won't come up.

@squadette
Copy link

squadette commented Feb 19, 2017

@ljharb no, it's not about escaping the >. Here is an example of JSON content in HTML comment:

<!--{"foo": "bar"}-->

Here is theoretically problematic JSON content in HTML comment:

<!--{"foo": "bar -- baz"} -->

The second (inner) -- will, according to spec, turn on termination on >, but it will be again disabled by the third, final --. So, > will not be interpreted as the end of comment, and browser will ignore everything until the next -- and > (possibly with other content between the two), which is not even guaranteed to be present.

Both Chrome and Safari handle this correctly, and this is one of the worse parts of HTML spec (deprecated and removed in the new spec). My concern is that if we really do that comments thing due to compatibility with older/broken browsers then we should think about what other problems this approach could lead to in the same older/broken browsers.

FTR: as for me, properly encoded JSON within the script tag, without comments, is perfectly ok.

@ljharb
Copy link
Collaborator

ljharb commented Feb 19, 2017

Interesting - that does sound like something we want to handle, then.

@ljharb ljharb reopened this Feb 19, 2017
@squadette
Copy link

Just in case, relevant part of HTML4 spec: https://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4

HTML 3.2 says that it is SGML Application: https://www.w3.org/TR/REC-html32#sgml

HTML 2.0: https://www.w3.org/MarkUp/html-spec/html-spec_3.html#SEC3.2.5

Sometimes they only talk about whitespace, and it's unclear what happens if there is anything else between the pairs of dash-dash.

@squadette
Copy link

Here is historical evidence that this is real issue in old versions of Firefox: http://weblog.200ok.com.au/2008/01/dashing-into-trouble-why-html-comments.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants