Skip to content

Add a new grammar renderer #1787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Apr 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8d1e7bd
Add some missing Syntax markers
ehuss Apr 10, 2025
acd95af
Add missing rules for syntax blocks
ehuss Apr 10, 2025
f208204
Fix some minor grammar formatting issues
ehuss Apr 10, 2025
b53a9ee
Fix CfgAttribute name
ehuss Apr 10, 2025
27e1ec9
Rename IsolatedCR to CR
ehuss Apr 10, 2025
c1faa76
Name common ascii control characters
ehuss Apr 10, 2025
94acc5e
Remove "followed by" in STRING_CONTINUE
ehuss Apr 10, 2025
86a49fc
Introduce a new "prose" terminal
ehuss Apr 10, 2025
a547e37
Normalize suffix capitalization
ehuss Apr 10, 2025
36c8d99
Remove parentheses around suffixes
ehuss Apr 10, 2025
35c098a
Fix nonterminals of ConstParam
ehuss Apr 10, 2025
556df6b
Rewrite how keywords are listed
ehuss Apr 10, 2025
963339e
Fix `dyn` edition presentation
ehuss Apr 10, 2025
1c65870
Add grammar rule for XID_Start and XID_Continue
ehuss Apr 10, 2025
420f4d3
Add a lexer rule for punctuation
ehuss Apr 10, 2025
a8e1afb
Add a grammar rule for reserved tokens
ehuss Apr 10, 2025
13996e6
Define the Token rule
ehuss Apr 10, 2025
65febd6
Remove escape rule
ehuss Apr 10, 2025
2baaa05
Introduce a new grammar renderer
ehuss Apr 10, 2025
216bd24
Add the javascript hooks for handling the new railroad grammar
ehuss Apr 10, 2025
ab8d215
Add styling for the new grammar and railroad diagrams
ehuss Apr 10, 2025
6c55e50
Add a summary chapter that shows all of the grammar productions on on…
ehuss Apr 10, 2025
ea629b4
Add some documentation for how to write grammar rules
ehuss Apr 10, 2025
a954c17
Fix rule reference links with multiple spaces
ehuss Apr 10, 2025
1f6941d
Add attributes base rule.
ehuss Apr 10, 2025
3a2ea7e
Convert all of the grammar to use the new grammar renderer
ehuss Apr 10, 2025
2056365
Update the introduction about grammar notation
ehuss Apr 10, 2025
cbf16e7
Update the notation chapter for grammar updates
ehuss Apr 10, 2025
07cb674
Don't index the grammar summary appendix
ehuss Apr 13, 2025
904b010
Reverse the Repeat expressions
ehuss Apr 13, 2025
1214d68
Use a new technique for rendering negative expressions
ehuss Apr 13, 2025
c8a3451
Fix check for new and unexpected roots
traviscross Apr 14, 2025
e6af4d3
Mark grammar roots inline in productions
traviscross Apr 14, 2025
4a69227
Cross link between grammar and railroad diagrams
traviscross Apr 14, 2025
4d69e5a
Replace a `match` with `let-else`
traviscross Apr 14, 2025
e323716
Render `*` repeats in forward direction
traviscross Apr 14, 2025
8a37649
Remove support for reversing railroad elements
traviscross Apr 14, 2025
dab73ea
Lower `RepeatRange` to railroads more elaborately
traviscross Apr 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions book.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ smart-punctuation = true

[output.html.search.chapter]
"test-summary.md" = { enable = false }
"grammar.md" = { enable = false }

[output.html.redirect]
"/expressions/enum-variant-expr.html" = "struct-expr.html"
Expand Down
4 changes: 4 additions & 0 deletions docs/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,7 @@ r[foo.bar.edition2021]
> [!EDITION-2021]
> Describe what changed in 2021.
```

## Grammar

See [Grammar](grammar.md) for details on how to write grammar rules.
122 changes: 122 additions & 0 deletions docs/grammar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Grammar

The Reference grammar is written in markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The `mdbook-spec` extension parses these rules and converts them to a renderable format, including railroad diagrams.

The code block should have a lang string with the word "grammar", a comma, and the category of the grammar, like this:

~~~
```grammar,items
ProductionName -> SomeExpression
```
~~~

The category is used to group similar productions on the grammar summary page in the appendix.

## Grammar syntax

The syntax for the grammar itself is pretty close to what is described in the [Notation chapter](../src/notation.md), though there are some rendering differences.

A "root" production, marked with `@root`, is one that is not used in any other production.

The syntax for the grammar itself (written in itself, hopefully that's not too confusing) is:

```
Grammar -> Production+

BACKTICK -> U+0060

LF -> U+000A

Production -> `@root`? Name ` ->` Expression

Name -> <Alphanumeric or `_`>+

Expression -> Sequence (` `* `|` ` `* Sequence)*

Sequence -> (` `* AdornedExpr)+

AdornedExpr -> ExprRepeat Suffix? Footnote?

Suffix -> ` _` <not underscore, unless in backtick>* `_`

Footnote -> `[^` ~[`]` LF]+ `]`

ExprRepeat ->
Expr1 `?`
| Expr1 `*?`
| Expr1 `*`
| Expr1 `+?`
| Expr1 `+`
| Expr1 `{` Range? `..` Range? `}`

Range -> [0-9]+

Expr1 ->
Unicode
| NonTerminal
| Break
| Terminal
| Charset
| Prose
| Group
| NegativeExpression

Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4

NonTerminal -> Name

Break -> LF ` `+

Terminal -> BACKTICK ~[LF]+ BACKTICK

Charset -> `[` (` `* Characters)+ ` `* `]`

Characters ->
CharacterRange
| CharacterTerminal
| CharacterName

CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK

CharacterTerminal -> Terminal

CharacterName -> Name

Prose -> `<` ~[`>` LF]+ `>`

Group -> `(` ` `* Expression ` `* `)`

NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )
```

The general format is a series of productions separated by blank lines. The expressions are:

| Expression | Example | Description |
|------------|---------|-------------|
| Unicode | U+0060 | A single unicode character. |
| NonTerminal | FunctionParameters | A reference to another production by name. |
| Break | | This is used internally by the renderer to detect line breaks and indentation. |
| Terminal | \`example\` | This is a sequence of exact characters, surrounded by backticks |
| Charset | [ \`A\`-\`Z\` \`0\`-\`9\` \`_\` ] | A choice from a set of characters, space separated. There are three different forms. |
| CharacterRange | [ \`A\`-\`Z\` ] | A range of characters, each character should be in backticks.
| CharacterTerminal | [ \`x\` ] | A single character, surrounded by backticks. |
| CharacterName | [ LF ] | A nonterminal, referring to another production. |
| Prose | \<any ASCII character except CR\> | This is an English description of what should be matched, surrounded in angle brackets. |
| Group | (\`,\` Parameter)+ | This groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions.
| NegativeExpression | ~[\` \` LF] | Matches anything except the given Charset, Terminal, or Nonterminal. |
| Sequence | \`fn\` Name Parameters | A sequence of expressions, where they must match in order. |
| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
| Suffix | \_except \[LazyBooleanExpression\]\_ | This adds a suffix to the previous expression to provide an additional English description to it, rendered in subscript. This can have limited markdown, but try to avoid anything except basics like links. |
| Footnote | \[^extern-safe\] | This adds a footnote, which can supply some extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal markdown footnote. |
| Optional | Expr? | The preceding expression is optional. |
| Repeat | Expr* | The preceding expression is repeated 0 or more times. |
| Repeat (non-greedy) | Expr*? | The preceding expression is repeated 0 or more times without being greedy. |
| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. |
| RepeatPlus (non-greedy) | Expr+? | The preceding expression is repeated 1 or more times without being greedy. |
| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bounds can be excluded, which works just like Rust ranges. |

## Automatic linking

The plugin automatically adds markdown link definitions for all the production names on every page. If you want to link directly to a production name, all you need to do is surround it in square brackets, like `[ArrayExpression]`.

In some cases there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. You can also do that if you just feel like being more explicit.
16 changes: 16 additions & 0 deletions mdbook-spec/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions mdbook-spec/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ edition = "2024"
license = "MIT OR Apache-2.0"
description = "An mdBook preprocessor to help with the Rust specification."
repository = "https://github.com/rust-lang/spec/"
default-run = "mdbook-spec"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

Expand All @@ -15,6 +16,7 @@ once_cell = "1.19.0"
pathdiff = "0.2.1"
# Try to keep in sync with mdbook.
pulldown-cmark = { version = "0.10.3", default-features = false }
railroad = { version = "0.3.2", default-features = false }
regex = "1.9.4"
semver = "1.0.21"
serde_json = "1.0.113"
Expand Down
Loading