Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

alexcrichton · 2018-03-01T20:14:56Z

I ran across a very curious case today... Given this procedural macro:

#![crate_type = "proc-macro"]
#![feature(proc_macro)]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_derive(Foo, attributes(foo))]
pub fn foo(input: TokenStream) -> TokenStream {
    println!("{}", input.to_string());
    print_tts(input, 0);

    TokenStream::empty()
}

fn print_tts(input: TokenStream, tab: usize) {
    let mut t = String::new();
    for _ in 0..tab {
        t.push_str("  ");
    }
    for token in input {
        match token.kind {
            proc_macro::TokenNode::Group(d, others) => {
                println!("{}{:?}", t, d);
                print_tts(others.into(), tab + 1);
            }
            s => println!("{}{:?}", t, s),
        }
    }
}

and this expansion:

#![crate_type = "rlib"]
#![feature(proc_macro)]

#[macro_use]
extern crate foo;

#[derive(Foo)]
pub struct MyStructc {
    #[cfg_attr(my_cfg, foo)]
    _a: i32,
}

when compiled I get:

$ rustc +nightly foo.rs && rustc +nightly bar.rs -L .
pub struct MyStructc {
    _a: i32,
}
Term(Term(pub))
Term(Term(struct))
Term(Term(MyStructc))
Brace
  Op('#', Alone)
  Bracket
    Term(Term(cfg_attr))
    Parenthesis
      Term(Term(my_cfg))
      Op(',', Alone)
      Term(Term(foo))
  Term(Term(_a))
  Op(':', Alone)
  Term(Term(i32))
Op(',', Alone)

which is quite curious!

~~The to_string() representation has the field present (even though the --cfg isn't supplied) and the token stream has the actual #[cfg_attr] there.~~

~~I was actually expecting neither outcome to happen (how naive of me!) in terms of #[cfg] processing typically happening before custom derives, but maybe that's not possible?~~ -- er, just wrong thinking here

cc @dtolnay
cc @jseyfried
cc @nrc

The text was updated successfully, but these errors were encountered:

dtolnay · 2018-03-01T22:02:49Z

Seems similar to how mod also causes an inconsistency. #47627

alexcrichton · 2018-03-02T02:48:30Z

Ok I think I see what's going on here.

This has to do with this comment introduced in #43230. Basically I think this boils down to #43081, we fundamentally cannot losslessly tokenize an AST item into a list of tokens to hand to a procedural macro.

What's happening here is that all the tokens of the struct MyStructc are saved off in the parser. This is the original source token stream which produced the struct. Later on when we tokenize the struct to hand off to a procedural macro we will use this cached token stream, if available, and tokenize it out and return it. When stringifying, however, we use the actual AST node itself (the source of truth).

The bug is that cfg processing does indeed happen before the procedural macro runs. The cfg processing is modifying the AST item's internal fields but it's not invalidating the cached token stream or modifying it. Hence when the procedural macro runs the cached token stream is invalid wrt the internal state of the AST node, causing two different results to come out depending on how you look at it.

Possible solutions here are:

Invalidate the cached list of tokens if the AST node is modified. This seems tricky to get right, are there other locations where a modification can happen?
Actually implemented Compiler loses location information before calling macros (sometimes) #43081, but holy cow that's a lot of work.
Compare the two results of tokenization (stringify + parse and the tokens themselves) and return the string one if they differ.

None of them seem like great solutions :(

This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

…r=nrc proc_macro: Avoid cached TokenStream more often This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846

alexcrichton added the A-macros-2.0 Area: Declarative macros 2.0 (#39412) label Mar 1, 2018

aturon mentioned this issue Mar 1, 2018

Tracking issue for RFC 1566: Procedural macros #38356

Closed

31 tasks

alexcrichton mentioned this issue Mar 2, 2018

Inconsistency between Display and IntoIterator for a TokenStream containing a module #47627

Open

alexcrichton mentioned this issue Apr 9, 2018

Compiler loses location information before calling macros (sometimes) #43081

Closed

alexcrichton mentioned this issue Apr 10, 2018

proc_macro: Avoid cached TokenStream more often #49852

Merged

bors closed this as completed in #49852 Apr 14, 2018

alexcrichton mentioned this issue Apr 17, 2018

Add a useful WebAssembly API rustwasm/twiggy#36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

alexcrichton commented Mar 1, 2018 •

edited

Loading

dtolnay commented Mar 1, 2018

alexcrichton commented Mar 2, 2018

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

Macros 2.0: #[cfg_attr] makes .to_string() and TokenStream disagree #48644

Comments

alexcrichton commented Mar 1, 2018 • edited Loading

dtolnay commented Mar 1, 2018

alexcrichton commented Mar 2, 2018

alexcrichton commented Mar 1, 2018 •

edited

Loading