Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML parsing is too case sensitive #341

Closed
Conduitry opened this issue Mar 6, 2017 · 7 comments
Closed

HTML parsing is too case sensitive #341

Conduitry opened this issue Mar 6, 2017 · 7 comments

Comments

@Conduitry
Copy link
Member

There are a number of case sensitivity issues with parsing HTML currently. What I've observed -

  • closing a tag with a different case than it was opened with triggers an error. It probably shouldn't
  • <script> and <style> elements only get their special processing when they are written in lowercase
  • the various void elements are only parsed correctly if they are written in lowercase, otherwise they expect a closing tag (!doctype causes an error #336)

It feels like these could all be resolved with a couple of well-placed .toLowerCase()s.

@Conduitry
Copy link
Member Author

One additional complication mentioned here is how to handle components whose names would now be detected as void elements. The simplest solution would probably be to just disallow those as component names, but I don't know how acceptable that is.

@Conduitry
Copy link
Member Author

These issues can in fact be resolved by a single .toLowerCase() in readTagName but this breaks nested components, unless they happen to be in all lowercase. I'm not sure what's a good way to handle this. I guess the problem is that this isn't really quite HTML - there are some things that we'd like to be case sensitive. Do we just tell users to make sure that regular HTML tags are always in lowercase?

There is something tempting about the JSX/React way of doing this, where we tell regular elements and components apart by the case of the first character in the tag - and then you don't even have to tell the compiler which components this uses - it's just the elements that begin with a capital letter.

@Rich-Harris
Copy link
Member

There is something tempting about the JSX/React way of doing this, where we tell regular elements and components apart by the case of the first character in the tag - and then you don't even have to tell the compiler which components this uses - it's just the elements that begin with a capital letter.

I agree. That's kind of the convention that's being used everywhere, perhaps we should just enforce it.

The alternative is to parse everything, then validate the HTML (i.e. disallow void elements with children etc) once we've analysed the JavaScript and therefore know what's a component and what isn't. But that's more work to accommodate practices that should probably be discouraged — the ecosystem is healthier if everyone agrees on things like naming conventions and casing.

@Conduitry
Copy link
Member Author

Are you thinking about going so far as to remove the components key from the component definition? Enforcing case rules is going to be a breaking change anyway, so there's not much sense in keeping it just for backward compatibility. I'd be in favor of removing it (once we decide we're ready for doing this in 2.0); it doesn't seem to fit in well with enforcing this convention.

@Rich-Harris
Copy link
Member

We'd still need components unless we just wanted to automatically bind imports, which feels a bit icky, no? Specifying them explicitly allows us to check that a component isn't missing, and prevents 'unused imports' linting errors etc

@Conduitry
Copy link
Member Author

Well, automatically binding imports is what React et al do. But yeah, the unused imports linting errors are a fair point. It does seem to be a bit out of line to be writing, say, a custom ESLint plugin for this like React has, so that using a component counts as a reference to it. Leaving components in sounds like the best plan.

@Conduitry
Copy link
Member Author

Closing this, I don't think there's a lot else that makes sense to do here. This is hopefully mostly taken care of by warnings indicating that components should begin with a capital letter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants