Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrongly parses tags inside <script> element #62

Open
jaredh159 opened this issue Jan 4, 2024 · 2 comments
Open

wrongly parses tags inside <script> element #62

jaredh159 opened this issue Jan 4, 2024 · 2 comments

Comments

@jaredh159
Copy link

thanks for the library! truly appreciate it!

the parser currently incorrectly finds tags inside of <script> elements, so for example:

<script>
// a comment <p>foo</p>
</script>

will produce a <p> html tag inside the script tag.

@dgjustice
Copy link

I ran into this myself. It appears to be eat any < as the beginning of a tag. I'll try and dig into the source when I can find time.

    <script>
        $(function() {
            var imgs = document.getElementsByTagName('img');
            var errors = 0;
            for(var i=0,j=imgs.length;i<j;i++){
                imgs[i].onerror = function(e){
                    this.src = "assets/error.gif";
                    this.parentElement.parentElement.className = "danger";
                    this.onerror = function(e){};
                };
                errors++;
            }
        });
    </script>
Tag(HTMLTag { _name: Bytes("script"), _attributes: Attributes { raw: InlineHashMap(InlineHashMap<0 items>), id: None, class: None }, _children: InlineVec(InlineVec<3 items>), _raw: Bytes("<script>") })
Raw(Bytes("\n        $(function() {\n            var imgs = document.getElementsByTagName('img');\n            var errors = 0;\n            for(var i=0,j=imgs.length;i"))
Raw(Bytes("script>\n"))

@y21
Copy link
Owner

y21 commented Feb 6, 2024

Yeah, tl currently doesn't special case the script and style tags, but they do need some special handling to make this work properly.

I initially thought that supporting <script> tags would be pretty complicated to parse, given that we couldn't "just" seek to the next </script>, since they may appear within strings or other constructs, which might not truly be the end of the script tag (or so I thought).

For instance, 1 </script>/ is valid javascript (parsed as "1 less than the regular expression /script>/"), which made me think that this should not be counted as the end of <script>, and that we would have to implement a (subset of) javascript in the html parser.

But (at least according to some testing in the browser) turns out that this isn't really needed. Even <script> "</script>" </script> is not "supported" (as in, will interpret the first </script> within a string to be the end, and doesn't try anything fancy like keeping track of string literal depth), so this might not be so hard to implement :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants