-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review proc macro API 1.2 #50473
Review proc macro API 1.2 #50473
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@dtolnay @alexcrichton @eddyb @nrc |
☔ The latest upstream changes (presumably #50453) made this pull request unmergeable. Please resolve the merge conflicts. |
src/libproc_macro/lib.rs
Outdated
@@ -207,7 +236,7 @@ pub mod token_stream { | |||
|
|||
/// `quote!(..)` accepts arbitrary tokens and expands into a `TokenStream` describing the input. | |||
/// For example, `quote!(a + b)` will produce a expression, that, when evaluated, constructs | |||
/// the `TokenStream` `[Word("a"), Op('+', Alone), Word("b")]`. | |||
/// the `TokenStream` `[Word("a"), Punct('+', Alone), Word("b")]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Word
is now Term
, right?
src/libproc_macro/lib.rs
Outdated
/// An identifier or lifetime identifier. | ||
Ident(Ident), | ||
/// A single punctuation character (`+`, `,`, `$`, etc.). | ||
Punct(Punct), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't expect these changes. I should leave most of this up to @alexcrichton (I do not have a strong preference either way).
src/libproc_macro/lib.rs
Outdated
'^', '&', '|', '@', '.', ',', ';', ':', '#', '$', '?']; | ||
if !LEGAL_CHARS.contains(&ch) { | ||
panic!("unsupported character `{:?}`", ch) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation seems really nice, I think previously it's been deferred until conversion to the compiler-internal forms is needed.
src/libproc_macro/lib.rs
Outdated
#[derive(Copy, Clone, Debug)] | ||
#[unstable(feature = "proc_macro", issue = "38356")] | ||
pub struct Term { | ||
pub struct Ident { | ||
// REVIEW(INTERNAL) Symbol + Span is actually `ast::Ident`! We can use it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this would not cause issues with #49219.
src/libproc_macro/lib.rs
Outdated
@@ -1199,7 +1300,7 @@ impl TokenTree { | |||
'#' => Pound, | |||
'$' => Dollar, | |||
'?' => Question, | |||
_ => panic!("unsupported character {}", op), | |||
_ => panic!("unsupported character {}", ch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this become unreachable!()
now? As only a finite set of characters can be used.
src/libproc_macro/lib.rs
Outdated
/// An identifier (`ident`) or lifetime identifier (`'ident`). | ||
/// | ||
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields. | ||
/// REVIEW Do we want to guarantee `Ident` to be `Copy`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do keep it Copy
, that means we can never soundly offer an &self -> &str
API. See #38356 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't mind if we remove the Copy here.
All the changes here look great to me, thanks @petrochenkov! While we've already FCP'd for stability in #38356 I think the I'd prefer to land this all at once if possible to reduce churn in the ecosystem, so I'd be mostly in favor of going ahead and stabilizing everything in this PR (if you're ok with that). I don't mind doing the work myself for the APIs here and pushing those up (annotation-wise). Along those lines, want to remove |
src/libproc_macro/lib.rs
Outdated
/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters | ||
/// or characters not existing in the language. | ||
/// | ||
/// REVIEW The function actually panics on any error and never returns `LexError`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't match the original implementation, when we first started we were sure to return an error I thought...
In any case I'd think it's a bug that this panics, it should definitely be returning an error (but we can fix that later)
src/libproc_macro/lib.rs
Outdated
/// Creates a token stream containing a single token tree. | ||
/// | ||
/// REVIEW We don't generally have impls `From<T> for Collection<T>`, but I see why this exists | ||
/// REVIEW from practical point of view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is currently the main way to go from a TokenTree
to a TokenStream
, and I think we also accidentally stabilized it already so we're required to keep it.
src/libproc_macro/lib.rs
Outdated
@@ -227,6 +256,9 @@ pub fn quote_span(span: Span) -> TokenStream { | |||
} | |||
|
|||
/// A region of source code, along with macro expansion information. | |||
/// | |||
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields. | |||
/// REVIEW Do we want to guarantee `Span` to be `Copy`? Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we're likely to always want Span
to be Copy
src/libproc_macro/lib.rs
Outdated
TokenTree::Literal(ref tt) => tt.fmt(f), | ||
} | ||
} | ||
} | ||
|
||
/// REVIEW the impls below are kind of `From<T> for Option<T>`, not strictly necessary, | ||
/// REVIEW but convenient. No harm, I guess. I'd actually like to see impls | ||
/// REVIEW `From<Group/Ident/Punct/Literal> for TokenStream` to avoid stuttering like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect impls like these will grow and come to exist over time yeah, that's definitely the direction that libstd has taken.
src/libproc_macro/lib.rs
Outdated
/// forms of `Spacing` returned. | ||
/// | ||
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields. | ||
/// REVIEW Do we want to guarantee `Punct` to be `Copy`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is less critical than Span
to be copy, and wouldn't mind going either way here. Making this not-Copy
is probably the most conservative thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's go with no Copy.
src/libproc_macro/lib.rs
Outdated
/// which can be further configured with the `set_span` method below. | ||
/// | ||
/// REVIEW Why we even use `char` here? There's no reason to use unicode here. | ||
/// REVIEW I guess because it's more convenient to write `new('+')` than `new(b'+')`, that's ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's stick to char
for maximal future flexibility.
src/libproc_macro/lib.rs
Outdated
/// | ||
/// REVIEW Again, there's no need for unicode here, | ||
/// REVIEW except for maybe future compatibility in case Rust turns into APL, | ||
/// REVIEW but if it's more convenient to use `char` then that's okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for future flexibility I think char
's the way to go here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we decided against TokenTree::Ident was because the matcher $i:ident
does not accept lifetimes. I would be opposed to calling it Ident here if it can hold lifetimes and $i:ident
cannot.
@dtolnay The key questions here are probably:
|
Sure, I'll add |
Probably to parse anything looking like a list of generics, if you want to tell apart |
I would need to play with this in nightly before I would feel comfortable committing to it. What I wouldn't want is:
|
Hmm, can we imagine any macros that would need to handle lifetime parameters and type parameters differently? There is some appeal in being able to say the distinction never matters until much later in the compile process. |
One place I can imagine running into trouble is if a macro somehow needs to handle optional lifetimes on a reference. macro_rules! m {
(& $($p:lifetime)? $i:ident) => {}
}
m!(&foo);
m!(&'a foo);
|
This is a weird error. I never investigated how our macro matching works in detail, but I expected it to be modeled after regular expressions (greedy by default) and work in both cases, whether extern crate regex;
use regex::Regex;
fn main() {
let not_subset = Regex::new(r"a?b").unwrap();
assert!(not_subset.is_match("ab"));
assert!(not_subset.is_match("b"));
let subset = Regex::new(r"a?\w").unwrap();
assert!(subset.is_match("ab"));
assert!(subset.is_match("aa"));
assert!(subset.is_match("b"));
assert!(subset.is_match("a"));
} http://play.rust-lang.org/?gist=372d5a5f25219670dabc89a1dfb536ba&version=nightly&mode=debug |
We have 6 weeks if we want to stabilize the API in 1.28. @eddyb
|
Remove unnecessary proc-macro-related `feature`s
@bors r=alexcrichton Stabilization commit for later cherry-picking: petrochenkov@3e71e49 |
📌 Commit dab8c0a has been approved by |
Review proc macro API 1.2 cc #38356 Summary of applied changes: - Documentation for proc macro API 1.2 is expanded. - Renamed APIs: `Term` -> `Ident`, `TokenTree::Term` -> `TokenTree::Ident`, `Op` -> `Punct`, `TokenTree::Op` -> `TokenTree::Punct`, `Op::op` -> `Punct::as_char`. - Removed APIs: `Ident::as_str`, use `Display` impl for `Ident` instead. - New APIs (not stabilized in 1.2): `Ident::new_raw` for creating a raw identifier (I'm not sure `new_x` it's a very idiomatic name though). - Runtime changes: - `Punct::new` now ensures that the input `char` is a valid punctuation character in Rust. - `Ident::new` ensures that the input `str` is a valid identifier in Rust. - Lifetimes in proc macros are now represented as two joint tokens - `Punct('\'', Spacing::Joint)` and `Ident("lifetime_name_without_quote")` similarly to multi-character operators. - Stabilized APIs: None yet. A bit of motivation for renaming (although it was already stated in the review comments): - With my compiler frontend glasses on `Ident` is the single most appropriate name for this thing, *especially* if we are doing input validation on construction. `TokenTree::Ident` effectively wraps `token::Ident` or `ast::Ident + is_raw`, its meaning is "identifier" and it's already named `ident` in declarative macros. - Regarding `Punct`, the motivation is that `Op` is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it `Punct` (first proposed in [the original RFC](rust-lang/rfcs#1566), then [by @SimonSapin](#38356 (comment))) , together with input validation it's now a subset of ASCII punctuation character category (`u8::is_ascii_punctuation`).
☀️ Test successful - status-appveyor, status-travis |
This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens produced by procedural macros ended up getting lost in translation in the compiler and not actually producing parseable code. The issue lies in the fact that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for gluing joint tokens together forgot to take this into account so the lifetime inside of `Ident` was missing the leading tick! The `glue` implementation here is updated to create a new `Symbol` in these situations to manufacture a new `Ident` with a leading tick to ensure it parses correctly. Closes rust-lang#50942
…petrochenkov rustc: Fix procedural macros generating lifetime tokens This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens produced by procedural macros ended up getting lost in translation in the compiler and not actually producing parseable code. The issue lies in the fact that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for gluing joint tokens together forgot to take this into account so the lifetime inside of `Ident` was missing the leading tick! The `glue` implementation here is updated to create a new `Symbol` in these situations to manufacture a new `Ident` with a leading tick to ensure it parses correctly. Closes rust-lang#50942
cc #38356
Summary of applied changes:
Term
->Ident
,TokenTree::Term
->TokenTree::Ident
,Op
->Punct
,TokenTree::Op
->TokenTree::Punct
,Op::op
->Punct::as_char
.Ident::as_str
, useDisplay
impl forIdent
instead.Ident::new_raw
for creating a raw identifier (I'm not surenew_x
it's a very idiomatic name though).Punct::new
now ensures that the inputchar
is a valid punctuation character in Rust.Ident::new
ensures that the inputstr
is a valid identifier in Rust.Punct('\'', Spacing::Joint)
andIdent("lifetime_name_without_quote")
similarly to multi-character operators.A bit of motivation for renaming (although it was already stated in the review comments):
Ident
is the single most appropriate name for this thing, especially if we are doing input validation on construction.TokenTree::Ident
effectively wrapstoken::Ident
orast::Ident + is_raw
, its meaning is "identifier" and it's already namedident
in declarative macros.Punct
, the motivation is thatOp
is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named itPunct
(first proposed in the original RFC, then by @SimonSapin) , together with input validation it's now a subset of ASCII punctuation character category (u8::is_ascii_punctuation
).