Replies: 7 comments 11 replies
-
I've been following this discussion on designing a flexible and reusable system for HTMLish languages, and I'd like to share some thoughts that might contribute to our collective understanding and approach. Given the complexity and diversity of HTMLish languages (including HTML, Vue, Angular, etc.), I believe that employing separate parsers for each language, coupled with a trait-based system for linters and formatters, presents a viable and robust solution. This approach allows for nuanced parsing that can accurately reflect the unique syntax and semantics of each language while providing a unified interface for tooling, thereby facilitating code reuse and extensibility. Separate Parsers for Each LanguageCreating dedicated parsers for each HTMLish language ensures that the parsing process is tailored to the specific needs and features of each language. This specificity is crucial for accurately capturing the nuances of each language, including unique attributes, directives, and template syntaxes. It also allows for performance optimizations specific to the parsing and analysis needs of each language, enhancing efficiency and effectiveness. Trait-Based Linters/FormattersA trait-based approach for linters and formatters can significantly streamline the development of these tools. By defining common operations, checks, and analyses in traits, we can apply these tools across different languages with minimal code duplication. This not only speeds up the development process but also ensures consistency in how different languages are linted and formatted, providing a more uniform developer experience. Extending Codegen for Grammar ReuseOne of the challenges in this approach is the potential duplication of effort in defining grammar for similar constructs across languages. To address this, I propose extending our code generation system to allow for the sharing of grammar components among the different language parsers. This would involve:
This approach not only fosters code reuse but also ensures that our parsers remain flexible and accurate, capable of adapting to the specific requirements of each HTMLish language. ConclusionBy combining separate parsers with a trait-based approach for tooling and an extended code generation system for grammar reuse, we can achieve a balance that allows for accurate, language-specific parsing and efficient, reusable tooling. This strategy respects the unique aspects of each language while promoting a more unified and efficient development process for linters, formatters, and other analysis tools. I'm eager to hear your thoughts on this approach and how we might refine it further to address our collective goals. |
Beta Was this translation helpful? Give feedback.
-
I went a rather different way on this one: language embedding all the way down. I see the TSX language as TSX syntax embedded in TS syntax embedded in JS syntax embedded in comment syntax embedded in whitespace. |
Beta Was this translation helpful? Give feedback.
-
I believe this approach shouldn't require 'rust for everything' as an approach out the gate. Parsing angular source code is a non-trivial task, and projects like https://github.com/prettier/angular-estree-parser exist that take that problem out of the equation. Why not call that somehow to get your syntax tree, pass it back into rust and then apply the transformations? Maybe use https://docs.rs/neon/latest/neon/index.html for that. Then, over time, when time permit, replace the AST syntax parser with a rust implementation. For angular: Template code is either in .component.html, or in in the template property of the @component decorator. When that is detected, activate a dedicated parsing pipeline. Make a call |
Beta Was this translation helpful? Give feedback.
-
VueI've made a very rough state diagram for Vue's HTMLish template syntax. Ref: https://vuejs.org/guide/essentials/template-syntax.html stateDiagram-v2
[*] --> HtmlTag: <
HtmlTag --> HtmlText: >
HtmlText --> AnyJsExpression: {{
AnyJsExpression --> HtmlText: }}
HtmlText --> HtmlTagClose: </
HtmlTagClose --> [*]: >
HtmlTag --> HtmlAttributeList: (space)
HtmlAttributeList --> HtmlText:>
state HtmlAttributeList {
state attr_type <<choice>>
state vue_attr_eq <<join>>
[*] --> attr_type
attr_type --> VueDirectiveName: v-
attr_type --> VueVBindShorthand: colon
attr_type --> VueVOnShorthand: @
attr_type --> HtmlAttribute: else
VueDirectiveName --> VueDirectiveArgument: colon
VueVBindShorthand --> VueDirectiveArgument
VueDirectiveArgument --> VueDirectiveArgumentDynamic: [
VueDirectiveArgumentDynamic --> vue_attr_eq: ]=
VueVOnShorthand --> VueDirectiveArgument
VueDirectiveArgument --> VueDirectiveModifier: .
VueDirectiveArgumentDynamic --> VueDirectiveModifier: ].
VueDirectiveModifier --> vue_attr_eq: =
VueDirectiveArgument --> vue_attr_eq: =
vue_attr_eq --> VueDirectiveValue: "
VueDirectiveValue --> [*]: "
}
A couple of notes:
But this gives me the impression that writing a grammar for Vue wouldn't be that bad. SvelteSvelte's grammar is going to be significantly more complex because of the additional syntax. stateDiagram-v2
[*] --> HtmlTag: <
HtmlTag --> HtmlText: >
HtmlText --> HtmlTagClose: </
HtmlTagClose --> [*]: >
HtmlTag --> HtmlAttributeList: (space)
HtmlAttributeList --> HtmlText:>
state HtmlAttributeList {
state attr_type <<choice>>
[*] --> attr_type
attr_type --> SvelteBindingShorthand: {
attr_type --> HtmlAttributeName: A-z
HtmlAttributeName --> HtmlAttributeValue: =
HtmlAttributeValue --> HtmlAttributeText: "
HtmlAttributeValue --> SvelteBindingValue: {
HtmlAttributeValue --> SvelteBindingValue: "{
SvelteBindingValue --> [*]: }
SvelteBindingValue --> [*]: }"
HtmlAttributeText --> [*]: "
SvelteBindingValue --> SvelteSpreadBindingShorthand: ...
SvelteSpreadBindingShorthand --> [*]: }
SvelteBindingShorthand --> SvelteBindingValue
}
HtmlText --> SvelteLogicBlockStart: {#
SvelteLogicBlockStart --> SvelteLogicBlockKeyword
note right of SvelteLogicBlockKeyword
Obviously, after this would be expressions
and some other tokens, but i'm not
diagramming it for now
end note
SvelteLogicBlockKeyword --> HtmlText: }
HtmlText --> SvelteLogicBlockMiddle: {(colon)
SvelteLogicBlockMiddle --> SvelteLogicBlockKeyword
note right of SvelteLogicBlockMiddle
Not sure what this is actually called,
too lazy to look it up right now
end note
HtmlText --> SvelteLogicBlockEnd: {/
SvelteLogicBlockEnd --> SvelteLogicBlockKeyword
ConclusionBased on this little exploration, I don't think we are going to have Edit: Took a brief look at Angular, and it seems to share the |
Beta Was this translation helpful? Give feedback.
-
These observations for Vue and Svelte are spot on. I just wanted to mention Angular as well - like with its new control flow syntax https://angular.dev/guide/templates/control-flow and old control flow syntax ( |
Beta Was this translation helpful? Give feedback.
-
self closing tagsI'd like to very quickly and shortly mention the issue with Jake said it already very very well. But basically he said:
That said, i think Biome should trim |
Beta Was this translation helpful? Give feedback.
-
https://github.com/prettier/angular-html-parser that is used by prettier for html angular If biome devs can port it to rust, it would be perfect, also it helps to port angular template specific eslint rules |
Beta Was this translation helpful? Give feedback.
-
The HTMLish languages include HTML and HTML-based template languages like: Vue templates, Angular templates, Grale templates, etc.
The goal of this discussion is to share design constraints and ideas.
Objective
Enable code reuse for lints and formatting for the different HTML languages. For example, implement the logic for accessibility lints once and use it for all HTMLish languages.
The objective is not to optimize binary size but enable code reuse.
Design considerations
HTMLish languages are not additive
What I mean by additive is that a language dialect like TypeScript or JSX adds additional syntax to a language:
The same applies to JSON, JSON5, and JSON6: Using any JSON6-only syntax makes your whole file a JSON6 file.
That's because the "extension" dialects are supersets of the language they extend:
While it's true that you get an Angular template if you use any angular syntax in HTML, it doesn't work if you start mixing template dialects: What format does a file use if you use an Angular expression inside a Vue template? Vular, Angulue?
We'll run into the same problem with JavaScript if we add support for another JavaScript dialect, e.g., Flow. You could even argue that it's a situation we have today with SvelteJs, except that SvelteJs doesn't extend the JavaScript syntax but changes the language semantics.
Naming
HTML element and attribute names are not case insensitive but are case sensitive in Angular (at least when using TypeScript), JSX, and VueJS templates --- and potentially others.
Furthermore, some template languages support multiple syntaxes to assign an attribute value:
<a href="./link">body<a>
<a v-bind:href="link">body</a>
That means it is necessary for some languages to check for static attributes and bindings when testing if an attribute is set.
JSX
JSX is defined as part of the JavaScript language but otherwise fits the bill of an HTML template language. Meaning reusing code for HTMLish lints rule for JSX is desired but doing so ultimately means that either:
Design Ideas
Single HTMLish language
Create a unified HTMLish grammar that unifies all different HTMLish dialects.
This does not mean that all dialects use the same parser. Language could either share the parser implementation, or each comes with its own.
This approach does not allow for code reuse between JSX and HTMLish
Unified HTMLish syntax
The idea is to have language-specific grammars with a unifying language on top.
This approach is very compelling at first but has two shortcomings
HTMLish traits
Each HTMLish language has its own grammar and traits abstract the common operations used by the linter that allow iterating elements, finding attributes, testing attribute values, etc. Lints would then either be generic over the HTMLish trait implementation or a trait with methods to perform some language-specific operations.
This trait system may be supplemented by an HTMLish semantic model that allows fast queries for specific elements/attributes.
It's not entirely clear to me how this would work on the formatting side, but I could envision that the formatting of an Element is a trait that can then be implemented by language-specific formatters.
Beta Was this translation helpful? Give feedback.
All reactions