Thoughts on HTMLish Language Support #1726

ematipico · 2024-02-01T12:56:27Z

ematipico
Feb 1, 2024
Maintainer

from rome/tools#4077

The HTMLish languages include HTML and HTML-based template languages like: Vue templates, Angular templates, Grale templates, etc.

The goal of this discussion is to share design constraints and ideas.

Objective

Enable code reuse for lints and formatting for the different HTML languages. For example, implement the logic for accessibility lints once and use it for all HTMLish languages.

The objective is not to optimize binary size but enable code reuse.

Design considerations

HTMLish languages are not additive

What I mean by additive is that a language dialect like TypeScript or JSX adds additional syntax to a language:

If you use TypeScript syntax in a JavaScript file, it becomes TypeScript.
If you use JSX syntax, it becomes a JSX file
If you use JSX and TypeScript, it becomes a TSX file

The same applies to JSON, JSON5, and JSON6: Using any JSON6-only syntax makes your whole file a JSON6 file.

That's because the "extension" dialects are supersets of the language they extend:

JavaScript ⊆ TypeScript
JavaScript ⊆ JSX, TypeScript ⊆ TSX, JSX ⊆ TSX (with a very few exceptions)
JSON ⊆ JSON5 ⊆ JSON6

While it's true that you get an Angular template if you use any angular syntax in HTML, it doesn't work if you start mixing template dialects: What format does a file use if you use an Angular expression inside a Vue template? Vular, Angulue?

We'll run into the same problem with JavaScript if we add support for another JavaScript dialect, e.g., Flow. You could even argue that it's a situation we have today with SvelteJs, except that SvelteJs doesn't extend the JavaScript syntax but changes the language semantics.

Naming

HTML element and attribute names are not case insensitive but are case sensitive in Angular (at least when using TypeScript), JSX, and VueJS templates --- and potentially others.

Furthermore, some template languages support multiple syntaxes to assign an attribute value:

attribute syntax <a href="./link">body<a>
binding syntax: Vue.js <a v-bind:href="link">body</a>

That means it is necessary for some languages to check for static attributes and bindings when testing if an attribute is set.

JSX

JSX is defined as part of the JavaScript language but otherwise fits the bill of an HTML template language. Meaning reusing code for HTMLish lints rule for JSX is desired but doing so ultimately means that either:

JSX must be extracted from JavaScript into the HTMLish language (unclear how this would work considering the tight coupling to the JS parser)
The solution we seek must work across different syntax languages

Design Ideas

Single HTMLish language

Create a unified HTMLish grammar that unifies all different HTMLish dialects.

AnyHtmlishAttribute = VueBinding| AngualarBinding | HtmlAttribute | ...

This does not mean that all dialects use the same parser. Language could either share the parser implementation, or each comes with its own.

This approach does not allow for code reuse between JSX and HTMLish

Unified HTMLish syntax

The idea is to have language-specific grammars with a unifying language on top.

AnyHtmlishRoot = AngularRoot | VueRoot | HtmlRoot | ... 
AnyHtmlishAttribute = AnyAngularAttribute | AnyVueAttribute | ...

This approach is very compelling at first but has two shortcomings

The union nodes cannot implement AstNode because it's unclear what to use for the L: Language type parameter (the nodes are from many different languages)
Much of our infrastructure is typed over L: Language.

HTMLish traits

Each HTMLish language has its own grammar and traits abstract the common operations used by the linter that allow iterating elements, finding attributes, testing attribute values, etc. Lints would then either be generic over the HTMLish trait implementation or a trait with methods to perform some language-specific operations.

This trait system may be supplemented by an HTMLish semantic model that allows fast queries for specific elements/attributes.

It's not entirely clear to me how this would work on the formatting side, but I could envision that the formatting of an Element is a trait that can then be implemented by language-specific formatters.

denbezrukov · 2024-02-02T21:03:25Z

denbezrukov
Feb 2, 2024
Maintainer

I've been following this discussion on designing a flexible and reusable system for HTMLish languages, and I'd like to share some thoughts that might contribute to our collective understanding and approach.

Given the complexity and diversity of HTMLish languages (including HTML, Vue, Angular, etc.), I believe that employing separate parsers for each language, coupled with a trait-based system for linters and formatters, presents a viable and robust solution. This approach allows for nuanced parsing that can accurately reflect the unique syntax and semantics of each language while providing a unified interface for tooling, thereby facilitating code reuse and extensibility.

Separate Parsers for Each Language

Creating dedicated parsers for each HTMLish language ensures that the parsing process is tailored to the specific needs and features of each language. This specificity is crucial for accurately capturing the nuances of each language, including unique attributes, directives, and template syntaxes. It also allows for performance optimizations specific to the parsing and analysis needs of each language, enhancing efficiency and effectiveness.

Trait-Based Linters/Formatters

A trait-based approach for linters and formatters can significantly streamline the development of these tools. By defining common operations, checks, and analyses in traits, we can apply these tools across different languages with minimal code duplication. This not only speeds up the development process but also ensures consistency in how different languages are linted and formatted, providing a more uniform developer experience.

Extending Codegen for Grammar Reuse

One of the challenges in this approach is the potential duplication of effort in defining grammar for similar constructs across languages. To address this, I propose extending our code generation system to allow for the sharing of grammar components among the different language parsers. This would involve:

Identifying Common Grammar Elements: We start by identifying the elements that are common across HTMLish languages, such as basic HTML tags, common attributes, and template syntax elements that are shared among languages.
Creating a Shared Grammar Library: These common elements would then be abstracted into a shared library that can be included in the grammar definitions of each language. This allows us to define these elements once and reuse them across different parsers.
Allowing Language-Specific Extensions: Each language-specific grammar can extend the shared definitions with its unique elements and rules. This ensures that while we maximize code reuse, we also retain the flexibility to accurately capture the unique features of each language.

This approach not only fosters code reuse but also ensures that our parsers remain flexible and accurate, capable of adapting to the specific requirements of each HTMLish language.

Conclusion

By combining separate parsers with a trait-based approach for tooling and an extended code generation system for grammar reuse, we can achieve a balance that allows for accurate, language-specific parsing and efficient, reusable tooling. This strategy respects the unique aspects of each language while promoting a more unified and efficient development process for linters, formatters, and other analysis tools.

I'm eager to hear your thoughts on this approach and how we might refine it further to address our collective goals.

5 replies

ematipico Feb 6, 2024
Maintainer Author

I also think that a trait system is the best choice in the long run.

The initial set up will be quite difficult and awkward, because we will need to modify our code gen just for this syntax .

Maybe, instead of starting from the grammar, we should start from an initial implementation.

Perhaps we could start by implementing a trait for the attributes, and then trying to make it "specific" for HTML and another language.

What do you think about it?

Conaclos Feb 6, 2024
Maintainer

Just wondering: are there so many differences between htmlish languages that justify separate grammar and CST? We could use a common CST and a parser mode to parse accurately language?

ematipico Feb 6, 2024
Maintainer Author

As explained in the beginning:

The union nodes cannot implement AstNode because it's unclear what to use for the L: Language type parameter (the nodes are from many different languages)

Much of our infrastructure is typed over L: Language.

Conaclos Feb 6, 2024
Maintainer

If we use a single grammar, we have a single language. I don't understand your comment @ematipico .

ematipico Feb 6, 2024
Maintainer Author

If we use a single grammar, we have a single language. I don't understand your comment @ematipico .

We can't re-use logic and nodes. One of the objectives would be to add JSX to the HTML dialects.

For example, Astro and Vue allow JSX syntax in their files. Astro uses a dialect of JSX. Although, today JSX nodes are typed over the JavaScript Language, so it's unclear how we could re-use what we have inside new files.

conartist6 · 2024-03-15T15:19:40Z

conartist6
Mar 15, 2024

I went a rather different way on this one: language embedding all the way down. I see the TSX language as TSX syntax embedded in TS syntax embedded in JS syntax embedded in comment syntax embedded in whitespace.

0 replies

jpike88 · 2024-05-10T13:24:47Z

jpike88
May 10, 2024

I believe this approach shouldn't require 'rust for everything' as an approach out the gate. Parsing angular source code is a non-trivial task, and projects like https://github.com/prettier/angular-estree-parser exist that take that problem out of the equation. Why not call that somehow to get your syntax tree, pass it back into rust and then apply the transformations?

Maybe use https://docs.rs/neon/latest/neon/index.html for that.

Then, over time, when time permit, replace the AST syntax parser with a rust implementation.

For angular:

Template code is either in .component.html, or in in the template property of the @component decorator. When that is detected, activate a dedicated parsing pipeline.

Make a call

0 replies

dyc3 · 2024-07-03T17:15:21Z

dyc3
Jul 3, 2024
Maintainer

Vue

I've made a very rough state diagram for Vue's HTMLish template syntax. Ref: https://vuejs.org/guide/essentials/template-syntax.html

stateDiagram-v2
    [*] --> HtmlTag: <
    HtmlTag --> HtmlText: >
    HtmlText --> AnyJsExpression: {{
    AnyJsExpression --> HtmlText: }}
    HtmlText --> HtmlTagClose: </
    HtmlTagClose --> [*]: >

    HtmlTag --> HtmlAttributeList: (space)
    HtmlAttributeList --> HtmlText:>

    state HtmlAttributeList {
        state attr_type <<choice>>
        state vue_attr_eq <<join>>

        [*] --> attr_type
        attr_type --> VueDirectiveName: v-
        attr_type --> VueVBindShorthand: colon
        attr_type --> VueVOnShorthand: @
        attr_type --> HtmlAttribute: else

        VueDirectiveName --> VueDirectiveArgument: colon
        VueVBindShorthand --> VueDirectiveArgument
        VueDirectiveArgument --> VueDirectiveArgumentDynamic: [
        VueDirectiveArgumentDynamic --> vue_attr_eq: ]=
        VueVOnShorthand --> VueDirectiveArgument
        VueDirectiveArgument --> VueDirectiveModifier: .
        VueDirectiveArgumentDynamic --> VueDirectiveModifier: ].
        VueDirectiveModifier --> vue_attr_eq: =
        VueDirectiveArgument --> vue_attr_eq: =

        vue_attr_eq --> VueDirectiveValue: "
        VueDirectiveValue --> [*]: "
    }

A couple of notes:

This doesn't really account for self closing elements
VueDirectiveValue would be AnyJsExpression, as long as it doesn't contain a ", which would result in a syntax error.
Doesn't account for directives with no value, eg. v-else

But this gives me the impression that writing a grammar for Vue wouldn't be that bad.

Svelte

Svelte's grammar is going to be significantly more complex because of the additional syntax.
Ref:

stateDiagram-v2
    [*] --> HtmlTag: <
    HtmlTag --> HtmlText: >
    HtmlText --> HtmlTagClose: </
    HtmlTagClose --> [*]: >

    HtmlTag --> HtmlAttributeList: (space)
    HtmlAttributeList --> HtmlText:>

    state HtmlAttributeList {
        state attr_type <<choice>>
        
        [*] --> attr_type
        attr_type --> SvelteBindingShorthand: {
        attr_type --> HtmlAttributeName: A-z
        HtmlAttributeName --> HtmlAttributeValue: =
        HtmlAttributeValue --> HtmlAttributeText: "
        HtmlAttributeValue --> SvelteBindingValue: {
        HtmlAttributeValue --> SvelteBindingValue: "{

        SvelteBindingValue --> [*]: }
        SvelteBindingValue --> [*]: }"
        HtmlAttributeText --> [*]: "

        SvelteBindingValue --> SvelteSpreadBindingShorthand: ...
        SvelteSpreadBindingShorthand --> [*]: }

        SvelteBindingShorthand --> SvelteBindingValue
    }

    HtmlText --> SvelteLogicBlockStart: {#
    SvelteLogicBlockStart --> SvelteLogicBlockKeyword
    note right of SvelteLogicBlockKeyword
    Obviously, after this would be expressions
    and some other tokens, but i'm not 
    diagramming it for now
    end note
    SvelteLogicBlockKeyword --> HtmlText: }

    HtmlText --> SvelteLogicBlockMiddle: {(colon)
    SvelteLogicBlockMiddle --> SvelteLogicBlockKeyword
    note right of SvelteLogicBlockMiddle
    Not sure what this is actually called,
    too lazy to look it up right now
    end note

    HtmlText --> SvelteLogicBlockEnd: {/
    SvelteLogicBlockEnd --> SvelteLogicBlockKeyword

Doesn't account for let: 2 way bindings
Doesn't fully diagram the logic blocks syntax for brevity
Doesn't account for the change to js behavior via the $: syntax
Doesn't account for special tags: https://svelte.dev/docs/special-tags
Doesn't account for element directives: https://svelte.dev/docs/element-directives
Doesn't account for special elements: https://svelte.dev/docs/special-elements

Conclusion

Based on this little exploration, I don't think we are going to have ~~any~~ many shared tokens between these template languages, with the sole exception being anything that is already valid html.

Edit: Took a brief look at Angular, and it seems to share the {{ }} tokens and it looks like they serve the same purpose.

5 replies

dyc3 Jul 7, 2024
Maintainer

I've opened a PR attempting to add Vue's template syntax to the HTML grammar: #3369

dyc3 Jul 7, 2024
Maintainer

There's one problem I hadn't considered before. We currently treat .vue/.svelte/others as JS files.

biome/crates/biome_js_syntax/src/file_source.rs

Lines 196 to 209 in 40727ca

    
           /// Astro file definition 
        
           pub fn astro() -> Self { 
        
               Self::ts().with_embedding_kind(EmbeddingKind::Astro) 
        
           } 
        
           /// Vue file definition 
        
           pub fn vue() -> Self { 
        
               Self::js_module().with_embedding_kind(EmbeddingKind::Vue) 
        
           } 
        
           /// Svelte file definition 
        
           pub fn svelte() -> Self { 
        
               Self::js_module().with_embedding_kind(EmbeddingKind::Svelte) 
        
           }

This is so that we are able to format and lint JS within blocks of code that we know is JS, (eg. vue and svelte's <script> tags), but it's kinda hacky.

I think that ultimately having separate grammars for each html-ish language is going to be inevitable. From a DX perspective, it would kinda make sense for all the vue lint rules to go in biome_vue_analyze, not biome_html_analyze.

However, it's unclear to me how we would switch languages mid parsing. I'm not aware of any place we are currently doing that. We need to be able to in order to correctly parse VueDirectiveValue and VueTemplateInterpolation (as named in #3369), for example.

ematipico Jul 8, 2024
Maintainer Author

I'm not aware of any place we are currently doing that

Nope, that will be the first time, and that's what we want to figure out.

HTML will be the only language where we will have to switch the parsing midway, but I don't think that's an issue, I believe it's fine if the html parser (the crate) depends on the JS one and the CSS one.

jpike88 Jul 8, 2024

Angular's control flow syntax sets it apart in a way that isn't what I'd call HTML-ish, I think it makes sense to treat each language as a seperate problem to solve. This also allows for the possibility of using existing AST parsing libraries as a intermediary step if the situation calls for it

Sec-ant Jul 8, 2024
Maintainer

I have some unpolished thoughts on the js/css parser reusing in html-ish languages: #3334 (comment). Bascially, I think we can evaluate the idea of considering them as embedded languages. We can have some default out-of-box internal rules to support <script>, <style>, and even <script type="application/ld+json"> and <script type="importmap">. The benefit is that with the embedded language + plugin infra, it will be trivial for us to compose parsers to support new syntaxes in the future, it will also be easier for users to customize the behavior, for example: vue custom blocks

Zooce · 2024-07-06T19:09:47Z

Zooce
Jul 6, 2024

These observations for Vue and Svelte are spot on. I just wanted to mention Angular as well - like with its new control flow syntax https://angular.dev/guide/templates/control-flow and old control flow syntax (<div *ngIf="show"></div>) and input/output binding (<component [someInput]="someSignal()" [(twoWay)]="data" (output)="doThis($event)" (click)="onClick()"></component>).

0 replies

junaga · 2024-08-13T16:32:51Z

junaga
Aug 13, 2024

self closing tags

I'd like to very quickly and shortly mention the issue with / in <div /> as jakearchibald.com/2023/against-self-closing-tags-in-html said, just in case someone missed that.

Jake said it already very very well. But basically he said:

JSX is not HTML. XHTML is not HTML. HTML is HTML. github:whatwg/html is HTML. In HTML the / is ignored. It's boilerplate. - notarealquote

others agree

That said, i think Biome should trim / from all elements in HTMLish languages. Perhaps with an "formatter" option to "off" or "preserve" this in addition to "on".

1 reply

jasongitmail Aug 13, 2024

I personally wouldn't bother with an option that allows HTML to remain invalid, or rather formatted as XHTML, unless a persuasive need is presented by a user. It'd just add clutter to Biome's UX.

pumano · 2024-10-19T21:28:29Z

pumano
Oct 19, 2024

https://github.com/prettier/angular-html-parser that is used by prettier for html angular

If biome devs can port it to rust, it would be perfect, also it helps to port angular template specific eslint rules

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on HTMLish Language Support #1726

{{title}}

Replies: 7 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Thoughts on HTMLish Language Support #1726

ematipico Feb 1, 2024 Maintainer

Objective

Design considerations

HTMLish languages are not additive

Naming

JSX

Design Ideas

Single HTMLish language

Unified HTMLish syntax

HTMLish traits

Replies: 7 comments · 11 replies

denbezrukov Feb 2, 2024 Maintainer

Separate Parsers for Each Language

Trait-Based Linters/Formatters

Extending Codegen for Grammar Reuse

Conclusion

ematipico Feb 6, 2024 Maintainer Author

Conaclos Feb 6, 2024 Maintainer

ematipico Feb 6, 2024 Maintainer Author

Conaclos Feb 6, 2024 Maintainer

ematipico Feb 6, 2024 Maintainer Author

conartist6 Mar 15, 2024

jpike88 May 10, 2024

dyc3 Jul 3, 2024 Maintainer

Vue

Svelte

Conclusion

dyc3 Jul 7, 2024 Maintainer

dyc3 Jul 7, 2024 Maintainer

ematipico Jul 8, 2024 Maintainer Author

jpike88 Jul 8, 2024

Sec-ant Jul 8, 2024 Maintainer

Zooce Jul 6, 2024

junaga Aug 13, 2024

self closing tags

jasongitmail Aug 13, 2024

pumano Oct 19, 2024

ematipico
Feb 1, 2024
Maintainer

Replies: 7 comments 11 replies

denbezrukov
Feb 2, 2024
Maintainer

ematipico Feb 6, 2024
Maintainer Author

Conaclos Feb 6, 2024
Maintainer

ematipico Feb 6, 2024
Maintainer Author

Conaclos Feb 6, 2024
Maintainer

ematipico Feb 6, 2024
Maintainer Author

conartist6
Mar 15, 2024

jpike88
May 10, 2024

dyc3
Jul 3, 2024
Maintainer

dyc3 Jul 7, 2024
Maintainer

dyc3 Jul 7, 2024
Maintainer

ematipico Jul 8, 2024
Maintainer Author

Sec-ant Jul 8, 2024
Maintainer

Zooce
Jul 6, 2024

junaga
Aug 13, 2024

pumano
Oct 19, 2024