-
Notifications
You must be signed in to change notification settings - Fork 657
☂️ Attach trivia to tokens #1720
Comments
This is done already, although we should decide on a different name that is different from |
Our crates are lacking a pattern. rslint fork, for example, does not have "rome" as prefix. I would vote for no prefixes, to be a par with other crates. We can revisit this later, if needed. |
Yeah makes sense. Let's take down the prefix. |
Hi! Can I work on removing |
@xunilrj I see, thanks for reply. I will try to update this PR. |
@mrkldshv we started a discord discussion to align on our crate naming. I suggest waiting on the outcome before working on the PR. I'm also not sure if it makes sense to build on top of my stale PR. You probably have an easier time starting from scratch. |
@MichaReiser thanks for the information. I'll wait for outcome and probably create new PR based on the decision. |
Description
part of: #1718
Pending Decisions
modelling trivia as cached references (similar to token/nodes) or as enum (Space, Tab, NewLine, SingleLineComment, - - MultiLineComment) and store them by value?
Tasks
EOF
token.Research
Lexer merges newline and trailing spaces
generates
LosslessTreeSink
It seems possible to prepend and append trivia to tokens inside the
LosslessTreeSink
doing the following:LosslessTreeSink::start_node
LosslessTreeSink::do_token
can eat and append trivia until finds a "new line"Doing this generates a trivia free and clean tree:
Before the tree was:
To eliminate the trivia
SyntaxKind
The trivia processing in the
LosslessTreeSink
demands that the trivia are identifiable by itsSyntaxKind
. Therefore, to eliminate theSyntaxKind
, we need to move this processing to the lexer.In this case, the change would need to be done at
Lexer::lex_token
andLexer::lex_template
. Unfortunately,impl Iterator for Lexer<'_>
is too late because it already works with Tokens.The lexer is implemented with multiple functions returning
LexerReturn
, likeLexerReturn
ispub type LexerReturn = (Token, Option<Diagnostic>);
. So we would need to consume the trailing trivia inside these functions. Boring, but doable.Another option is to allow these functions to parse just its token and find leading/trailing trivia outside, in a more general function.
Here
lex_token
andlex_template
seems to be the best option.Another possible advantage of doing this in the lexer is that we can implement this as an option in the lexer:
I did a quick test draining all trivia from the lexer to test if the parser works, and it does work.
The only complication I found is the
Token::len
. Now it contains the token length ("let" = 3). In this case we would need aToken::complete_length()
(" let " = 5). We would need this inside theLosslessTreeSink
to correctly point to the original&str
.Another implication of eliminating
SyntaxKind
would be harder to have the trivia inside the token as enums. We can create an additional enum for this, of course, but maybe it is cheaper to have trivia as&str
.How to store the trivia inside the Green tree?
Swift
Swift has just pointers and lengths to trivia. The downside is that the trivia "loses" its parsing and needs to be parsed again. Swift documentation even reminds you that you should cache the parsing.
https://github.com/apple/swift/blob/7123d2614b5f222d03b3762cb110d27a9dd98e24/include/swift/Syntax/RawSyntax.h#L185
https://github.com/apple/swift/blob/7123d2614b5f222d03b3762cb110d27a9dd98e24/include/swift/Syntax/RawSyntax.h#L447
This solution looks interesting because we can keep the
GreenToken
with a fixed size and the vast majority of cases trivia will be trivially reparsed (you saw this pun coming 😛).The problem is that we want out Green tree to be language independent. In this case to reparse the trivia we would need to know from which language the trivia was generated from. We would need to tag with a enum, or carry a function pointer to the trivia parser.
C#
Roslyn, surprisingly or not, uses a more C#-ish approach. There is no storage for the trivia inside the GreenNode.
https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/Core/Portable/Syntax/GreenNode.cs
Trivia has its own class:
https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/CSharp/Portable/Syntax/InternalSyntax/SyntaxTrivia.cs
Just for reference,
CSharpSyntaxNode
isGreenNode
.So
SyntaxTrivia
isGreenNode
. And we havehttps://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/CSharp/Portable/Syntax/InternalSyntax/SyntaxToken.SyntaxIdentifierWithTrivia.cs
that is used as
https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/CSharp/Portable/Syntax/InternalSyntax/SyntaxToken.SyntaxIdentifierWithTrivia.cs
My current preference is:
1 -
GreenToken
will have an enum with two cases: the most common one (a limited number of whitespaces); and a Vec with fallback to complex cases;2 -
SyntaxToken
will expose trivia as a wrapper to the GreenToken. We will put helper methods here in the future.Trivia Memory Comsuption
We decided to "flat" the storage to decrease memory comsuption
The
Box<Vec<...>>
is polemic. There is even a lint rule to avoid it. Because of this and other issues we have have this discussion to address how we can improve trivia storage.Discussions
#1809
PRs
#1716
#1738 (Old PR. Will be abandoned).
#1783
#1798
#1801
The text was updated successfully, but these errors were encountered: