@@ -94,40 +94,96 @@ The descriptions of lexical analysis and syntax use a modified
9494`Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form >`_ grammar
9595notation. This uses the following style of definition:
9696
97- .. productionlist :: notation
98- name: `lc_letter ` (`lc_letter ` | "_")*
99- lc_letter: "a"..."z"
100-
101- The first line says that a ``name `` is an ``lc_letter `` followed by a sequence
102- of zero or more ``lc_letter ``\ s and underscores. An ``lc_letter `` in turn is
103- any of the single characters ``'a' `` through ``'z' ``. (This rule is actually
104- adhered to for the names defined in lexical and grammar rules in this document.)
105-
106- Each rule begins with a name (which is the name defined by the rule) and
107- ``::= ``. A vertical bar (``| ``) is used to separate alternatives; it is the
108- least binding operator in this notation. A star (``* ``) means zero or more
109- repetitions of the preceding item; likewise, a plus (``+ ``) means one or more
110- repetitions, and a phrase enclosed in square brackets (``[ ] ``) means zero or
111- one occurrences (in other words, the enclosed phrase is optional). The ``* ``
112- and ``+ `` operators bind as tightly as possible; parentheses are used for
113- grouping. Literal strings are enclosed in quotes. White space is only
114- meaningful to separate tokens. Rules are normally contained on a single line;
115- rules with many alternatives may be formatted alternatively with each line after
116- the first beginning with a vertical bar.
117-
118- .. index :: lexical definitions, ASCII
119-
120- In lexical definitions (as the example above), two more conventions are used:
121- Two literal characters separated by three dots mean a choice of any single
122- character in the given (inclusive) range of ASCII characters. A phrase between
123- angular brackets (``<...> ``) gives an informal description of the symbol
124- defined; e.g., this could be used to describe the notion of 'control character'
125- if needed.
126-
127- Even though the notation used is almost the same, there is a big difference
128- between the meaning of lexical and syntactic definitions: a lexical definition
129- operates on the individual characters of the input source, while a syntax
130- definition operates on the stream of tokens generated by the lexical analysis.
131- All uses of BNF in the next chapter ("Lexical Analysis") are lexical
132- definitions; uses in subsequent chapters are syntactic definitions.
133-
97+ .. grammar-snippet ::
98+ :group: notation
99+
100+ name: `letter ` (`letter ` | `digit ` | "_")*
101+ letter: "a"..."z" | "A"..."Z"
102+ digit: "0"..."9"
103+
104+ In this example, the first line says that a ``name `` is a ``letter `` followed
105+ by a sequence of zero or more ``letter ``\ s, ``digit ``\ s, and underscores.
106+ A ``letter `` in turn is any of the single characters ``'a' `` through
107+ ``'z' `` and ``A `` through ``Z ``; a ``digit `` is a single character from ``0 ``
108+ to ``9 ``.
109+
110+ Each rule begins with a name (which identifies the rule that's being defined)
111+ followed by a colon, ``: ``.
112+ The definition to the right of the colon uses the following syntax elements:
113+
114+ * ``name ``: A name refers to another rule.
115+ Where possible, it is a link to the rule's definition.
116+
117+ * ``TOKEN ``: An uppercase name refers to a :term: `token `.
118+ For the purposes of grammar definitions, tokens are the same as rules.
119+
120+ * ``"text" ``, ``'text' ``: Text in single or double quotes must match literally
121+ (without the quotes). The type of quote is chosen according to the meaning
122+ of ``text ``:
123+
124+ * ``'if' ``: A name in single quotes denotes a :ref: `keyword <keywords >`.
125+ * ``"case" ``: A name in double quotes denotes a
126+ :ref: `soft-keyword <soft-keywords >`.
127+ * ``'@' ``: A non-letter symbol in single quotes denotes an
128+ :py:data: `~token.OP ` token, that is, a :ref: `delimiter <delimiters >` or
129+ :ref: `operator <operators >`.
130+
131+ * ``"a"..."z" ``: Two literal characters separated by three dots mean a choice
132+ of any single character in the given (inclusive) range of ASCII characters.
133+ * ``<...> ``: A phrase between angular brackets gives an informal description
134+ of the matched symbol (for example, ``<any ASCII character except "\"> ``),
135+ or an abbreviation that is defined in nearby text (for example, ``<Lu> ``).
136+ * ``e1 e2 ``: Items separated only by whitespace denote a sequence.
137+ Here, ``e1 `` must be followed by ``e2 ``.
138+ * ``e1 | e2 ``: A vertical bar is used to separate alternatives.
139+ It is the least tightly binding operator in this notation.
140+ * ``e* ``: A star means zero or more repetitions of the preceding item.
141+ * ``e+ ``: Likewise, a plus means one or more repetitions.
142+ * ``[e] ``: A phrase enclosed in square brackets means zero or
143+ one occurrences. In other words, the enclosed phrase is optional.
144+ * ``e? ``: A question mark has exactly the same meaning as square brackets:
145+ the preceding item is optional.
146+ * ``(e) ``: Parentheses are used for grouping.
147+
148+ The unary operators (``* ``, ``+ ``, ``? ``) bind as tightly as possible.
149+
150+ White space is only meaningful to separate tokens.
151+
152+ Rules are normally contained on a single line, but rules that are too long
153+ may be wrapped:
154+
155+ .. grammar-snippet ::
156+ :group: notation
157+
158+ literal: `stringliteral ` | `bytesliteral `
159+ | `integer` | `floatnumber` | `imagnumber`
160+
161+ Alternatively, rules may be formatted with the first line ending at the colon,
162+ and each alternative beginning with a vertical bar on a new line.
163+ For example:
164+
165+
166+ .. grammar-snippet ::
167+ :group: notation-alt
168+
169+ literal:
170+ | `stringliteral`
171+ | `bytesliteral`
172+ | `integer`
173+ | `floatnumber`
174+ | `imagnumber`
175+
176+ This does *not * mean that there is an empty first alternative.
177+
178+ .. index :: lexical definitions
179+
180+ .. note ::
181+
182+ There is some difference between *lexical * and *syntactic * analysis:
183+ the :term: `lexical analyzer ` operates on the individual characters of the
184+ input source, while the *parser * (syntactic analyzer) operates on the stream
185+ of :term: `tokens <token> ` generated by the lexical analysis.
186+ However, in some cases the exact boundary between the two phases is a
187+ CPython implementation detail.
188+
189+ This documentation uses the same BNF grammar for both.
0 commit comments