Replace the pretty-print/retokenize hack with a direct comparsion of AST nodes #75820
Labels
A-proc-macros
Area: Procedural macros
C-cleanup
Category: PRs that clean code up or issues documenting cleanup.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Currently, proc-macro expansion pretty-prints the
Nonterminal
being passed to the proc macro:rust/src/librustc_parse/lib.rs
Lines 283 to 293 in 5528caf
This ensures that the
TokenStream
we pass to a proc-macro always agrees with the AST struct it is attached to. This is necessary because the compiler may directly modify the AST structure after it is parsed but before it is passed to a proc-macro. Currently, it looks like#[cfg]
attributes are the only way for a proc-macro to observe this: aderive
macro will have cfg-stripping applied to the fields/variants of the target struct/enum before it is invoked.Unfortunately, while this check has no false negatives (keeping an invalid
TokenStream
), it has had a large number of false positives (throwing away theTokenStream
when it could have been kept):where
clause when pretty-printing #73157When a false positive occurs, it results in an instance of #43081. Originally, I thought that these false positives could only result in unspanned/mis-spanned compiler errors - while obnoxious to deal with, they can be fixed as they come up.
Unfortunately, using the re-created
TokenStream
means that in addition to losing location information, we also lose hygiene information (since both are accessed via aSpan
). This is much more serious - it results in code compiling that should not. In two cases, users actually wrote macros that explicitly relied on hygiene information being lost, likely because they did not realize that this was a bug. Fixing pretty-print/retokenize false positives can therefore incur unavoidable breakage in certain crates. If one of these buggy macros was exposed in a public API, the problem would be much worse. See #73084 for more details.I propose that we replace the pretty-print/reparse hack with the following strategy:
Item
) just before theTokenStream
is attached. We then attach astruct AstTokens(P(T), TokenStream)
to the struct. The struct now stores a copy of itself (behind aP
) as it was just after parsing.#[derive(PartialEq)]
for the entire AST - while a few structs/enum do this, most don't.Nonterminal
, we compare the current AST struct to the original one that it stores. If they are equal, we use the storedTokenStream
- otherwise, we pretty-print and retokenize.There are some details that would need to be worked out:
Nonterminal
) has an attribute macro removed before it is expanded.AstEq
trait, which we derive instead ofPartialEq
. It's conceivable that we might want to implementPartialEq
for a 'looser' form of equality on some AST structs in the future, which would be incompatible with the 'strict' equality (e.g. checking thatNodeId
s match exactly) required by this check. However, doing so might be overkill.PartialEQ
could conceivable be higher than pretty-printing and re-tokenizing - we would need to benchmark it.However, adopting this scheme would allow us to bypass all issues resulting from inconsistencies in the pretty-printer. While we would still require the pretty-printer to produce correct code (when we detect a mismatch), the proc-macro infrastructure would no longer require it to exactly preserve certain information (empty
where
clauses, extra parentheses, etc.)Alternatives:
MutVisitor
, create some kind of smart pointer that invalidates theTokenStream
when mutable access is required. I attempted several variants of this approach, but all proved to be extremely invasive, requiring changes across the entire compiler.#[cfg]
s and inner attributes correctly). While this approach might work, it seems dangerous - any modifications to the AST done by the compiler risk getting lost whenever proc-macros are involved. While I'm pretty sure that#[cfg]
s are the only place where we intentionally modify the AST before a proc-macro sees it, we might inadvertently introduce additional modifications in the future.The text was updated successfully, but these errors were encountered: