- 
                Notifications
    You must be signed in to change notification settings 
- Fork 557
Add a new grammar renderer #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just fixing some small consistency and spacing mistakes.
This rule was misnamed, colliding with the existing CfgAttrAttribute.
This renames IsolatedCR to CR. I felt like it wasn't exactly necessary since we have rewritten things so that it is clear that there is an input transformation which resolves this (`input.crlf`). We also never really defined what it meant. I also felt like there was room for confusion. For example, an input containing `CR CR LF LF` would get normalized to `CR LF`. The `CR` there is not isolated.
This removes all backslash escaped characters. This helps to avoid confusing similarities with a literal backslash followed by a character versus the interpreted escaped character.
I don't exactly know why this was placed there, but we operate under the assumption that all lexical characters immediately follow one another.
This introduces a new terminal kind that I'm calling a "prose" which describes what the terminal is. This is inspired by the IETF format which uses angle brackets to describe terminals in English.
The grammar almost always uses lowercase, so let's standardize on that.
This helps to standardize how suffixes are written. Normally they do not use parentheses, and visually I don't think they entirely necessary.
These two nonterminals were using the wrong name for the productions for BlockExpression and LiteralExpression.
This changes the keyword listings so that they are just lists instead of lexer rules. We never used the named rules, and I don't foresee us ever doing that. Also, the `IDENTIFIER_OR_KEYWORD` rule meant that we never needed to explicitly identify these keywords as lexer tokens. This helps avoid problems when building the grammar graph for missing connections.
Per our style, edition differences are supposed to be separated out into an edition block.
These were defined in prose below, but defining them here allows us to easily refer and link to them.
This is intended to help define what a "token" is via the grammar (and to fill a missing hole in our token definition). I waffled on how to define delimiters, whether they should be separate somehow. In practice I think it should be fine to clump them all together. This mainly only matters for TokenTree which already excludes the delimiters.
This adds a grammar rule that collects all the reserved token forms into a single production rule so that we can define what a "token" is by referring to this.
This defines a Token in the grammar so that we can easily refer to it (and to make it easier to see what all the tokens are).
We no longer represent characters via escape sequences. These can be confused with the literal two bytes of backslash followed by a character. See the "common productions" list for how these are now referred to.
| 
 The  AFAICS there are cases where the grammar diverges from its graphical representation with respect to repeated elements. In the two examples below, the diagram only allows for for at least two consecutive  | 
| 
 From the live demo I can see that  What about doing it like this? This uses one less path and can be concatenated with a previous  The main problem is that  | 
| 
 With respect to  It's possible to implement  | 
| I see, so that's already supported and just a matter of generating the proper diagram downstream. | 
This adds an extension to mdbook-spec that will parse code-blocks in a BNF-style grammar into a rendered format, in both markdown or as railroad diagrams.
This adds the hooks to toggle the visibility of the railroad grammar. The status is stored in localstorage to keep it sticky.
This fixes it so that rule links work correctly if there is more than one space in a reference definition.
| The conflicting directions one would be resolved by #1787 (comment).  The  fn main() {
    let x: &str = "\u{a____________________________________}";
    println!("_{x}_");
}(On  | 
| 
 👀 I was actually to lazy to check, sorry for the confusion. [The syntax is somewhat hilarious?!] 
 On  | 
We track the "roots" in our grammar -- those productions that aren't used in any other production. We want to report when a new root appears or when something that's expected to be a root no longer is one. However, we were reporting the latter case as the former instead of reporting it separately as intended. Let's fix that.
There are two ways to render a "zero or more" (i.e. `*`) repeat. One is to put nothing on the main forward line and to put the pattern on the recurrent edge, and the other is to put the pattern on the main forward line and to have an empty recurrent edge and an empty bypass edge. That is, for the latter, we can think of `thing*` as `(thing+)?`. Doing it that latter way means an additional edge, but it buys us something big in return, which is that it keeps all the patterns going in the forward direction. Doing it the other way means the patterns have to be reversed so as to put them underneath on that recurrent edge, and it means that readers then have to read them right to left. Reversing the elements also causes a bug in some diagrams where the lines end up running in opposing directions and so the trains crash into each other. See: - rust-lang#1787 (comment) Keeping things in the forward direction avoids this problem. In this commit, we'll leave in place all the infrastructure for reversing the elements though it is no longer used. We can of course pull this out later.
| I've pushed up a set of commits. I had originally planned to merge this first and do these separately, but they're somewhat intertwined with fixing issues that we probably should fix here, so perhaps it's best to look at these now. | 
There are two ways to render a "zero or more" (i.e. `*`) repeat. One is to put nothing on the main forward line and to put the pattern on the recurrent edge, and the other is to put the pattern on the main forward line and to have an empty recurrent edge and an empty bypass edge. That is, for the latter, we can think of `thing*` as `(thing+)?`. Doing it that latter way means an additional edge, but it buys us something big in return, which is that it keeps all the patterns going in the forward direction. Doing it the other way means the patterns have to be reversed so as to put them underneath on that recurrent edge, and it means that readers then have to read them right to left. Reversing the elements also causes a bug in some diagrams where the lines end up running in opposing directions and so the trains crash into each other. See: - rust-lang#1787 (comment) Keeping things in the forward direction avoids this problem. In this commit, we'll leave in place all the infrastructure for reversing the elements though it is no longer used. We can of course pull this out later.
c907258    to
    a2515e4      
    Compare
  
    There are two ways to render a "zero or more" (i.e. `*`) repeat. One is to put nothing on the main forward line and to put the pattern on the recurrent edge, and the other is to put the pattern on the main forward line and to have an empty recurrent edge and an empty bypass edge. That is, for the latter, we can think of `thing*` as `(thing+)?`. Doing it that latter way means an additional edge, but it buys us something big in return, which is that it keeps all the patterns going in the forward direction. Doing it the other way means the patterns have to be reversed so as to put them underneath on that recurrent edge, and it means that readers then have to read them right to left. Reversing the elements also causes a bug in some diagrams where the lines end up running in opposing directions and so the trains crash into each other. See: - rust-lang#1787 (comment) Keeping things in the forward direction avoids this problem. In this commit, we'll leave in place all the infrastructure for reversing the elements though it is no longer used. We can of course pull this out later.
We check that the list of grammar "roots" -- that is, productions that are not used in any other production -- is what we expect it to be. We had hard coded this list of roots in `mdbook-spec`. Let's instead add a way to specify this in our syntax for productions by prefixing the production with `@root`.
When reviewing a production in the grammar, one often wants to quickly find the corresponding railroad diagram, and when reviewing a railroad diagram, one often wants to quickly find the corresponding production in the grammar. Let's make this easy by linking each production in the grammar to the corresponding railroad diagram, and from the name of each railroad diagram to the corresponding production in the grammar. When clicking on a production in the grammar, we'll automatically display the railroad diagrams if those are not already displayed.
We can save a line by replacing this `match` with a `let-else`, so let's do that.
There are two ways to render a "zero or more" (i.e. `*`) repeat. One is to put nothing on the main forward line and to put the pattern on the recurrent edge, and the other is to put the pattern on the main forward line and to have an empty recurrent edge and an empty bypass edge. That is, for the latter, we can think of `thing*` as `(thing+)?`. Doing it that latter way means an additional edge, but it buys us something big in return, which is that it keeps all the patterns going in the forward direction. Doing it the other way means the patterns have to be reversed so as to put them underneath on that recurrent edge, and it means that readers then have to read them right to left. Reversing the elements also causes a bug in some diagrams where the lines end up running in opposing directions and so the trains crash into each other. See: - rust-lang#1787 (comment) Keeping things in the forward direction avoids this problem. In this commit, we'll leave in place all the infrastructure for reversing the elements though it is no longer used. We can of course pull this out later.
We no longer need to reverse the elements anywhere in our railroad diagrams, so let's remove the supporting infrastructure for doing this.
3778580    to
    8a37649      
    Compare
  
    For `RepeatRange(e, a, b)`, we were rendering `e` on the main line
then rendering under it a message about how many times it may or must
repeat based on `a` and `b`.
The trouble is that if we say that something "repeats once" on the
recurrent edge -- after we've already consumed a thing -- that reads
reasonably as though we're saying that two things can be consumed when
that's not what we mean.
Similarly, it's a bit odd to say, on the recurrent edge, that
something must "repeat twice" when that edge (and presumably then that
rule) may not be taken at all.
Let's solve all this by doing the following:
- For `e{1..1}`, simply render the node.
- For `e{0..1}`, treat this as simply `e?`.
- For `e{0..}`, treat this as simply `e*`.
- For `e{1..}`, treat this as simply `e+`.
- For `e{a..0}`, render an empty node.
- For `e{0..b} b > 1`, treat this as `(e{1..b})?`.
- For `e{1..b} b > 1`, render the node on the main line, then on the
  recurrent line render "at most {b - 1} more times".
- For `e{a..b} a > 1`, make a sequence of length `a` where the final
  node repeats `{1..b - (a - 1)}` times (or `{1..}` times if `b` is
  unbounded).
(We'll also add a check in parsing to ensure that for the range to be
well formed `a <= b`.)
As it turns out, the most straightforward way to implement this isn't
by recursing.  Doing that means we end up needing to take special care
to handle the suffix and the footnote, we have to build up an extra
`Expression` we don't need, and we have to `unwrap` the call.
Instead, it works better to treat this lowering in the manner of a
transitioning state machine in the spirit of `loop match` as proposed
in RFC 3720.
    Update books ## rust-lang/book 1 commits in 45f05367360f033f89235eacbbb54e8d73ce6b70..d33916341d480caede1d0ae57cbeae23aab23e88 2025-04-08 18:24:27 UTC to 2025-04-08 18:24:27 UTC - Ch01+ch02 after tech review (rust-lang/book#4329) ## rust-lang/edition-guide 2 commits in 1e27e5e6d5133ae4612f5cc195c15fc8d51b1c9c..467f45637b73ec6aa70fb36bc3054bb50b8967ea 2025-04-15 19:49:59 UTC to 2025-04-11 15:27:31 UTC - fix grammar errors (rust-lang/edition-guide#374) - remove the unused and deprecated `multilingual` field from `book.toml` (rust-lang/edition-guide#375) ## rust-lang/nomicon 2 commits in b4448fa406a6dccde62d1e2f34f70fc51814cdcc..0c10c30cc54736c5c194ce98c50e2de84eeb6e79 2025-04-09 01:54:42 UTC to 2025-04-07 20:22:31 UTC - Remove double wording in opaque type chapter (rust-lang/nomicon#487) - remove `rust-intrinsic` ABI (rust-lang/nomicon#485) ## rust-lang/reference 6 commits in 46435cd4eba11b66acaa42c01da5c80ad88aee4b..3340922df189bddcbaad17dc3927d51a76bcd5ed 2025-04-15 19:03:24 UTC to 2025-04-10 01:56:25 UTC - Add a new grammar renderer (rust-lang/reference#1787) - Misc. spelling fixes (rust-lang/reference#1785) - Fix std::ops links in range-expr (rust-lang/reference#1786) - traits.md: remove unusual formatting (rust-lang/reference#1784) - doc: add missing space (rust-lang/reference#1782) - spelling fix, Discrimants -> Discriminants (rust-lang/reference#1783)
Update books ## rust-lang/book 1 commits in 45f05367360f033f89235eacbbb54e8d73ce6b70..d33916341d480caede1d0ae57cbeae23aab23e88 2025-04-08 18:24:27 UTC to 2025-04-08 18:24:27 UTC - Ch01+ch02 after tech review (rust-lang/book#4329) ## rust-lang/edition-guide 2 commits in 1e27e5e6d5133ae4612f5cc195c15fc8d51b1c9c..467f45637b73ec6aa70fb36bc3054bb50b8967ea 2025-04-15 19:49:59 UTC to 2025-04-11 15:27:31 UTC - fix grammar errors (rust-lang/edition-guide#374) - remove the unused and deprecated `multilingual` field from `book.toml` (rust-lang/edition-guide#375) ## rust-lang/nomicon 2 commits in b4448fa406a6dccde62d1e2f34f70fc51814cdcc..0c10c30cc54736c5c194ce98c50e2de84eeb6e79 2025-04-09 01:54:42 UTC to 2025-04-07 20:22:31 UTC - Remove double wording in opaque type chapter (rust-lang/nomicon#487) - remove `rust-intrinsic` ABI (rust-lang/nomicon#485) ## rust-lang/reference 6 commits in 46435cd4eba11b66acaa42c01da5c80ad88aee4b..3340922df189bddcbaad17dc3927d51a76bcd5ed 2025-04-15 19:03:24 UTC to 2025-04-10 01:56:25 UTC - Add a new grammar renderer (rust-lang/reference#1787) - Misc. spelling fixes (rust-lang/reference#1785) - Fix std::ops links in range-expr (rust-lang/reference#1786) - traits.md: remove unusual formatting (rust-lang/reference#1784) - doc: add missing space (rust-lang/reference#1782) - spelling fix, Discrimants -> Discriminants (rust-lang/reference#1783)
Update books ## rust-lang/book 1 commits in 45f05367360f033f89235eacbbb54e8d73ce6b70..d33916341d480caede1d0ae57cbeae23aab23e88 2025-04-08 18:24:27 UTC to 2025-04-08 18:24:27 UTC - Ch01+ch02 after tech review (rust-lang/book#4329) ## rust-lang/edition-guide 2 commits in 1e27e5e6d5133ae4612f5cc195c15fc8d51b1c9c..467f45637b73ec6aa70fb36bc3054bb50b8967ea 2025-04-15 19:49:59 UTC to 2025-04-11 15:27:31 UTC - fix grammar errors (rust-lang/edition-guide#374) - remove the unused and deprecated `multilingual` field from `book.toml` (rust-lang/edition-guide#375) ## rust-lang/nomicon 2 commits in b4448fa406a6dccde62d1e2f34f70fc51814cdcc..0c10c30cc54736c5c194ce98c50e2de84eeb6e79 2025-04-09 01:54:42 UTC to 2025-04-07 20:22:31 UTC - Remove double wording in opaque type chapter (rust-lang/nomicon#487) - remove `rust-intrinsic` ABI (rust-lang/nomicon#485) ## rust-lang/reference 6 commits in 46435cd4eba11b66acaa42c01da5c80ad88aee4b..3340922df189bddcbaad17dc3927d51a76bcd5ed 2025-04-15 19:03:24 UTC to 2025-04-10 01:56:25 UTC - Add a new grammar renderer (rust-lang/reference#1787) - Misc. spelling fixes (rust-lang/reference#1785) - Fix std::ops links in range-expr (rust-lang/reference#1786) - traits.md: remove unusual formatting (rust-lang/reference#1784) - doc: add missing space (rust-lang/reference#1782) - spelling fix, Discrimants -> Discriminants (rust-lang/reference#1783)
Rollup merge of rust-lang#139884 - rustbot:docs-update, r=ehuss Update books ## rust-lang/book 1 commits in 45f05367360f033f89235eacbbb54e8d73ce6b70..d33916341d480caede1d0ae57cbeae23aab23e88 2025-04-08 18:24:27 UTC to 2025-04-08 18:24:27 UTC - Ch01+ch02 after tech review (rust-lang/book#4329) ## rust-lang/edition-guide 2 commits in 1e27e5e6d5133ae4612f5cc195c15fc8d51b1c9c..467f45637b73ec6aa70fb36bc3054bb50b8967ea 2025-04-15 19:49:59 UTC to 2025-04-11 15:27:31 UTC - fix grammar errors (rust-lang/edition-guide#374) - remove the unused and deprecated `multilingual` field from `book.toml` (rust-lang/edition-guide#375) ## rust-lang/nomicon 2 commits in b4448fa406a6dccde62d1e2f34f70fc51814cdcc..0c10c30cc54736c5c194ce98c50e2de84eeb6e79 2025-04-09 01:54:42 UTC to 2025-04-07 20:22:31 UTC - Remove double wording in opaque type chapter (rust-lang/nomicon#487) - remove `rust-intrinsic` ABI (rust-lang/nomicon#485) ## rust-lang/reference 6 commits in 46435cd4eba11b66acaa42c01da5c80ad88aee4b..3340922df189bddcbaad17dc3927d51a76bcd5ed 2025-04-15 19:03:24 UTC to 2025-04-10 01:56:25 UTC - Add a new grammar renderer (rust-lang/reference#1787) - Misc. spelling fixes (rust-lang/reference#1785) - Fix std::ops links in range-expr (rust-lang/reference#1786) - traits.md: remove unusual formatting (rust-lang/reference#1784) - doc: add missing space (rust-lang/reference#1782) - spelling fix, Discrimants -> Discriminants (rust-lang/reference#1783)
In rust-lang#1787 I missed linkify-ing references to grammar rules that weren't links. This makes sure that they are linked and validated.


This introduces a new grammar renderer. Instead of trying to write the grammar in markdown/html hybrid, this introduces a new syntax that is parsed by the mdbook-spec plugin. This grammar is then converted into markdown/html hybrid, and also to railroad diagrams.
There are a lot of changes here (and some can be split into separate PRs if desired). A general overview of what to see here:
docs/grammar.mdfile for a complete description.mdbook-spec/src/grammar/parser.rsinto an internal representation.mdbook-spec/src/grammar/render_markdown.rs, and railroad diagrams inmdbook-spec/src/grammar/render_railroad.rs.mdbook-spec/src/grammar.rs. There are several pieces here:[FunctionParameters]. Link definitions are automatically added to every page.I'd like to thank @lukaslueg for creating the railroad library which made this possible.
Closes #221
Closes #398
Closes #596
Closes #1513
Closes #1677