Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
332 changes: 332 additions & 0 deletions text/0000-macro-metavar-expr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
- Feature Name: `macro_metavar_expr`
- Start Date: 2021-01-23
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add new syntax to declarative macros to give their authors easy access to
additional metadata about macro metavariables, such as the index, length, or
count of macro repetitions.

# Motivation
[motivation]: #motivation

Macros with repetitions often expand to code that needs to know or could
benefit from knowing how many repetitions there are, or which repetition is
currently being expanded. Consider the example macro used in the guide to
introduce the concept of macro repetitions: building a vector, recreating the
`vec!` macro from the standard library:

```
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
```

This would be more efficient if it could use `Vec::with_capacity` to
preallocate the vector with the correct length. However, there is no standard
facility in declarative macros to achieve this, as there is no way to obtain
the *number* of repetitions of `$x`.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The [example `vec` macro defininition in the guide][guide-vec] could be made
more efficient if it could use `Vec::with_capacity` to pre-allocate a vector
with the correct capacity. To do this, we need to know the number of
repetitions.

[guide-vec]: https://doc.rust-lang.org/book/ch19-06-macros.html#declarative-macros-with-macro_rules-for-general-metaprogramming

Metadata about metavariables, like the number of repetitions, can be accessed
using **metavariable expressions**. The metavariable expression for the
count of the number of repetitions of a metavariable `x` is `${count(x)}`, so
we can improve the `vec` macro as follows:

```
#[macro_export]
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::with_capacity(${count(x)});
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
```

The following metavariable expressions are available:

| Expression | Meaning |
|----------------------------|------------|
| `${count(ident)}` | The number of times `$ident` repeats in total. |
| `${count(ident, depth)}` | The number of times `$ident` repeats at up to `depth` nested repetition depths. |
| `${index()}` | The current index of the inner-most repetition. |
| `${index(depth)}` | The current index of the nested repetition at `depth` steps out. |
| `${length()}` | The length of the inner-most repetition. |
| `${length(depth)}` | The length of the nested repetition at `depth` steps out. |
| `${ignore(ident)}` | Binds `$ident` for repetition, but expands to nothing. |
| `$$` | Expands to a single `$`, for removing ambiguity in recursive macro definitions. |

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Metavariable expressions in declarative macros provide expansions for
information about metavariables that are otherwise not easily obtainable.

This is a backwards-compatible change as both `$$` and `${ .. }` are not
currently accepted as valid.

The metavariable expressions added in this RFC are concerned with declarative
macro metavariable repetitions, and obtaining the information that the
compiler knows about the repetitions that are being processed.

## Count

The `${count(x)}` metavariable expression shown in the `vec` example in the
previous section counts the number of repetitions that will occur if the
identifier is used in a repetition at this depth. This means that in a macro
expansion like:

```
${count(x)} $( $x )*
```

the expression `${count(x)}` will expand to the number of times the `$( $x )*`
repetition will repeat.

If repetitions are nested, then an optional depth parameter can be used to
limit the number of nested repetitions that are counted. For example, a macro
expansion like:

```
${count(x, 1)} ${count(x, 2)} ${count(x, 3)} $( a $( b $( $x )* )* )*
```

The three values this expands to are the number of outer-most repetitions (the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be my ignorance of macros speaking, but does the RFC need to specify which kind of fragment they produce? Are they allowed to expand to (0+4) or 1+1+1+1 or 2*2 or 4_usize, or only to exactly 4? Is there any way I can start to depend on that expansion, like writing a macro that checks a length by only accepting a particular token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but it would be a good change to add it. It should expand to a literal with the appropriate value and no suffix (i.e. only and exactly 4). This allows consistent use in things like stringify! and tuple indexing. Type inferencing should be able to infer the correct type for when it is used in code (and also produce an error if the value that is produced won't fit inside the target type).

number of times `a` would be generated), the sum of the number of middle
repetitions (the number of times `b` would be generated), and the total number
of repetitions of `$x`.

## Index and length

Within a repetition, the `${index()}` and `${length()}` metavariable
expressions give the index of the current repetition and the length of the
repetition (i.e., the number of times it will repeat). The index value ranges
from `0` to `length - 1`.

For nested repetitions, the `${index()}` and `${length()}` metavariable
expressions expand to the inner-most index and length respectively.
If the `depth` parameter is specified, then the metavariable expression
expands to the index or length of the surrounding nested repetition, counting
outwards from the inner-most repetition. The expressions `${index()}` and
`${index(0)}` are equivalent.

For example in the expression:

```
$( a $( b $( c $x ${index()}/${length()} ${index(1)}/${length(1)} ${index(2)}/${length(2)} )* )* )*
```

the first pair of values are the index and length of the inner-most
repetition, the second pair are the index and length of the middle
repetition, and the third pair are the index and length of the outer-most
repetition.

## Ignore

Sometimes it is desired to repeat an expansion the same number of times as a
metavariable repeats but without actually expanding the metavariable. It may
be possible to work around this by expanding the metavariable in an expression
like `{ $x ; 1 }`, where the expanded value of `$x` is ignored, but this
is only possible if what `$x` expands to is valid in this kind of expression.

The `${ignore(ident)}` metavariable acts as if `ident` was used for the purposes
of repetition, but expands to nothing. This means a macro expansion like:

```
$( ${ignore(x)} a )*
```

will expand to a sequence of `a` tokens repeated the number of times that `x` repeats.

## Dollar dollar

Since metavariable expressions always apply during the expansion of the macro,
they cannot be used in recursive macro definitions. To allow recursive macro
definitions to use metavariable expressions, the `$$` expression expands to a
single `$` token.
Copy link
Member

@scottmcm scottmcm Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they any places where this would want to go deeper? Would it be helpful to have $$$ that expands to $$, instead of needing $$$$? (Does $$$$ work with this RFC, actually? Is exponential escaping bad, or fine because "just don't go that deep"?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going deeper is only necessary if macro definitions are multiple-times recursive (a macro that defines a macro that defines a macro), and you want to defer metavariable expansions or repetitions to the inner macros in ways that are otherwise ambiguous. The doubling up of the escape characters for each level is necessary so that at each nesting level you can represent a meta-variable whose name is stored in another meta-variable. An even number of $ followed by var (e.g. $$$$var) expands to n/2 $s followed by a literal var (e.g. $$var). An odd number of $ expands to (n-1)/2 $s followed by the expansion of $var, (e.g. if $var == foo then $$$$$var expands to $$foo).

An example of where this would be necessary in existing code is here. This code is currently using $dol as a hack for what $$ would provide, and $dol $arg would become $$$arg.

This is the same as for \-escaping in strings, and most other kinds of escaping in other languages, so it should be familiar to users.

Although sixteen $ in a row wouldn't be great, quadruply-recursive macro definitions are probably not a great idea either, and it should be possible to break the macro down into separate parts with less nesting if that does become a concern.


This is also necessary for unambiguously defining repetitions in nested
macros. For example, this resolves [issue 35853], as the example in
that issue can be expressed as:

```
macro_rules! foo {
() => {
macro_rules! bar {
( $$( $$any:tt )* ) => { $$( $$any )* };
}
};
}

fn main() { foo!(); }
```

[issue 35853]: https://github.com/rust-lang/rust/issues/35853

## Larger example

For a larger example of these metavariable expressions in use, consider the
following macro that operates over three nested repetitions:

```
macro_rules! example {
( $( [ $( ( $( $x:ident )* ) )* ] )* ) => {
counts = (${count(x, 1)}, ${count(x, 2)}, ${count(x)})
nested:
$(
indexes = (${index()}/${length()})
counts = (${count(x, 1)}, ${count(x)})
nested:
$(
indexes = (${index(1)}/${length(1)}, ${index()}/${length()})
counts = (${count(x)})
nested:
$(
indexes = (${index(2)}/${length(2)}, ${index(1)}/${length(1)}, ${index()}/${length()})
${ignore(x)}
)*
)*
)*
};
}
```

Given this input:
```
example! {
[ ( A B C D ) ( E F G H ) ( I J ) ]
[ ( K L M ) ]
}
```

The macro would expand to:
```
counts = (2, 4, 13)
nested:
indexes = (0/2)
counts = (3, 10)
nested:
indexes = (0/2, 0/3)
counts = (4)
nested:
indexes = (0/2, 0/3, 0/4)
indexes = (0/2, 0/3, 1/4)
indexes = (0/2, 0/3, 2/4)
indexes = (0/2, 0/3, 3/4)
indexes = (0/2, 1/3)
counts = (4)
nested:
indexes = (0/2, 1/3, 0/4)
indexes = (0/2, 1/3, 1/4)
indexes = (0/2, 1/3, 2/4)
indexes = (0/2, 1/3, 3/4)
indexes = (0/2, 2/3)
counts = (2)
nested:
indexes = (0/2, 2/3, 0/2)
indexes = (0/2, 2/3, 1/2)
indexes = (1/2)
counts = (1, 3)
nested:
indexes = (1/2, 0/1)
counts = (3)
nested:
indexes = (1/2, 0/1, 0/3)
indexes = (1/2, 0/1, 1/3)
indexes = (1/2, 0/1, 2/3)
```


# Drawbacks
[drawbacks]: #drawbacks

This adds additional syntax to the language, that program authors must learn
and understand. We may not want to add more syntax.

The author believes it is worth the overhead of new syntax, as even though
there exist workarounds for obtaining the information if it's really needed,
these workarounds are sometimes difficult to discover and naive
implementations can significantly harm compiler performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the workarounds for these are non-obvious, and that it's worth giving them well-known names. But I think the jump to new syntax could be better-motivated in the RFC.

For example, why not std::macro_utils::count!(...) instead of ${count(...)}? If it can be written as a macro as @RustyYato showed, that would then leave it up to an implementation to choose whether to add special compiler code to optimize it or just decide that the binary matching trick is good enough.

(I suspect it won't be too hard to convince me that syntax is worth it, but I'd still like to see it addressed in the RFC text.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workaround macros work by expanding to count!($( $value )*): i.e. the compiler must generate a sequence by expanding the repetition, re-parsing it as part of the count! macro invocation, and then computing the length. This is the redundant additional work that this RFC seeks to address.

The reason for new syntax is that these expansions occur during macro transcription, rather than as their own macro expansions. ${count(ident)} would be transcribed directly to the literal count, whereas count!(ident) in a macro body would be transcribed to count!(ident) (there is no change as the transcriber has nothing to do - it doesn't peek inside macro invocations), at which point the information about what ident means is lost and the count! macro has no knowledge about what it is counting or what context it is counting it in.

Another way to think of metavariable expressions is as "macro transcriber directives". You can then think of the macro transcriber as performing the following:

  • $var => the value of var
  • $( ... )* => a repetition
  • ${ directive } => a special transcriber directive
  • $$ => $

Perhaps describing it like this makes it a bit clearer that these are special things the transcriber to do (not necessarily limited to counts and indexes, but that is what this RFC focuses on).

We could special-case these macro invocations during transcription, but that feels like a worse solution. It would make it harder to understand what the macro transcriber is going to do with arbitrary code if you don't remember all of the special macros that don't work like other macros.

(Conversely, I think there might be existing special macros that might have been better written as metavariable expressions if they had already existed. While I haven't thought it through fully, file!(), line!() and column!() spring to mind as candidates).


Furthermore, the additional syntax is limited to declarative macros, and its
use should be limited to specific circustances where it is more understandable
than the alternatives.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This RFC proposes a modest but powerful extension to macro syntax that makes
it possible to obtain information that the compiler already knows, but
requires inefficient and complex techniques to obtain in the macro.

The original proposal was for a shorter syntax to provide the count of
repetitions: `$#ident`. During discussions of this syntax, it became clear
that it was not obvious as to which number this referred to: the count of
repetitions at this level, or the length of the current repetition. It also
does not provide a way to discover counts or lengths for other repetition
depths. There was also interest in being able to discover the index of the
current repetition, and the `#` character had been used in similar proposals
for that. There was some reservation expressed for the use of the `#` token
because of the cognitive burden of another sigil, and its common use in the
`quote!` macro.
Copy link
Contributor

@petrochenkov petrochenkov Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, I already asked this on Zulip, but could you document what syntactic space is available for this feature (and similar features) in general?

The main issue with choosing a syntax here is that pretty much any syntax is valid on the right hand side of the macro because it represents macro output, which can be an arbitrary token stream.
So, we are lucky that the combination $ { turned out reserved.
Are any other reserved combinations that can be used for new macro features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe $[ ... ] was also available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Lokathor says, this is mentioned briefly in the "Future possibilities" section at the bottom, but I will expand on it.

Currently the $ symbol can only be followed by an identifier or (, so anything else can be used in a language extension. This RFC specifies ${ .. } and $$, but $[ .. ] remains invalid, as does $ followed by any other symbol (so $@, $:, $! or similar could be used).

Additionally, metavariable expressions are intended to be extensible themselves. This RFC defines count, index, length and ignore, but future RFCs can add additional expressions of the form ${foo(fooargs...)}. Anything that fits within this pattern and can be evaluated by the macro transcriber would be a suitable candidate for another expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, and I like that it carves out an extensible space for future improvements to macro syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry slightly that ${...} and $(...) might look too similar. That might be an artifact of my font.

In particular, the its going to be tricky to give good diagnostics when a user writes $(...) when they meant to write ${...}, and vice versa. Especially if their macro body happens to refer to names like count or length

But I don't have great counter-suggestions; $[...] might be just as bad (though I do think it is easier to distinguish from $(...).) The only other counter-suggestion can think of is ${{...}}, but that might be bridge too far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your worry. I also have some reservations about the )} cluster in ${count(ident)}.

We don't have to use a delimited block. Since currently anything other than ident or (..) is invalid after $ we could use some other sigil. Some examples:

  • $:count(ident) e.g. let v = Vec::with_capacity($:count(values))
  • $@count(ident) e.g. let v = Vec::with_capacity($@count(values))
  • $!count(ident) e.g. let v = Vec::with_capacity($!count(values))

Using the last one as an example, this would be parsed as: $ ! <metavar-expr-function> ( <metavar-expr-args...> )

Other suggestions also welcome.


The meaning of the `depth` parameter in `index` and `count` originally
counted inwards from the outer-most nesting. This was changed to count
outwards from the inner-most nesting so that expressions can be copied
to a different nesting depth without needing to change them.

# Prior art
[prior-art]: #prior-art

Declarative macros with repetition are commonly used in Rust for things that
are implemented using variadic functions in other languages. Usually these
other languages provide mechanisms for finding the number of variadic
arguments, and it is a notable limitation that Rust does not.

Scripting languages, like Bash, which use `$var` for variables, often use
similar `${...}` syntax for values based on variables: for example `${#var}`
is used for the length of `$var`. This means `${...}` expressions should not
seem too weird to developers familiar with these scripting languages.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

No unresolved questions at present.

While more expressions are possible, expressions beyond those defined in this RFC are out-of-scope.

# Future possibilities
[future-possibilities]: #future-possibilities

The metavariable expression syntax (`${...}`) is purposefully generic, and may
be extended in future RFCs to anything that may be useful for the macro
expander to produce.

The syntax `$[...]` is still invalid, and so remains available for any other
extensions which may come in the future and don't fit in with metavariable
expression syntax.