capture more errors in fstring replacement field #67

sunmy2019 · 2023-04-09T20:34:45Z

This PR aims to capture all possible partial failures in fstring_replacement_field.

It breaks some tests, mostly due to change of the error message.

RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN was introduced to help user to locate the exact failure location.

sunmy2019 · 2023-04-09T21:05:14Z

this reopens #66.

It has 3 main features:

Correct Place

it outputs the exact place of the first error thanks to the newly introduced RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN
the old one cannot do that:

  File "<stdin>", line 1
    f"{123:{234 +}}"
                 ^
old: SyntaxError: f-string: expecting '}'

  File "<stdin>", line 1
    f"{123:{234 +}}"
                ^
new: SyntaxError: f-string: expecting '=', or '!', or ':', or '}'

This is because the parser needs to look ahead, so RAISE_SYNTAX_ERROR will output on the farthest token seen.
In this example, it points to a } but expecting a }, which is kinda confusing.

Capture All Kinds of Errors

Actually we can prove this rule will capture nearly all possible kinds of failure in fstring_replacement_field.

Friendlier Message

Sometimes you are expecting more than '}', the new one will output more to expect thanks to the rules being enriched.

  File "<string>", line 1
    f'{"s"!r{":10"}}'
            ^
old: SyntaxError: f-string: expecting '}'

  File "<string>", line 1
    f'{"s"!r{":10"}}'
            ^
new: SyntaxError: f-string: expecting ':' or '}'

CC:
@pablogsal @lysnikolaou @isidentical

lysnikolaou

Generally looks great! Only on comment we could discuss.

Grammar/python.gram

sunmy2019 · 2023-04-10T14:00:02Z

I will add a commit to fix the test case once the grammar in this pr is formalized and approved,

pablogsal · 2023-04-12T14:04:52Z

The grammar changes make sense to me, but for this PR to get landed we need to fix the new failing tests for sure

Lib/test/test_fstring.py

sunmy2019 · 2023-04-12T19:54:42Z

During the testing with flag -ucpu, I discovered a new problem:

tokenize is a handwritten tokenizer in pure python, and its definition of tokens is different from what is used in c.

I think these updates of tokens should cause at least a modification of tokenize._tokenize, or even a huge one.

I am wondering if we can utilize what we have in the c module. Since now the f-string supports recursion, making f'{f'{f'{f'{0}'}'}'}' much harder for handwritten tokenizer to keep track of things. And maintaining 2 systems sounds error-prone.

MWE

import tokenize
from io import BytesIO

f1 = BytesIO(b"f'{0}'")
f2 = BytesIO(b"f'{0\n}'")

print(*tokenize.tokenize(f1.readline), sep="\n")
print("-" * 80)
print(*tokenize.tokenize(f2.readline), sep="\n")

TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string="f'{0}'", start=(1, 0), end=(1, 6), line="f'{0}'")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 6), end=(1, 7), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')
--------------------------------------------------------------------------------
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
>> TokenInfo(type=1 (NAME), string='f', start=(1, 0), end=(1, 1), line="f'{0\n")
>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(1, 1), end=(1, 2), line="f'{0\n")
>> TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line="f'{0\n")
>> TokenInfo(type=2 (NUMBER), string='0', start=(1, 3), end=(1, 4), line="f'{0\n")
>> TokenInfo(type=66 (NL), string='\n', start=(1, 4), end=(1, 5), line="f'{0\n")
>> TokenInfo(type=55 (OP), string='}', start=(2, 0), end=(2, 1), line="}'")
>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(2, 1), end=(2, 2), line="}'")
TokenInfo(type=4 (NEWLINE), string='', start=(2, 2), end=(2, 3), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')

Also, a case

f1 = BytesIO(b"f'{f\"{0}\"}'")
f2 = BytesIO(b"f'{f'{0}'}'")

print(*tokenize.tokenize(f1.readline), sep="\n")
print("-" * 80)
print(*tokenize.tokenize(f2.readline), sep="\n")

TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string='f\'{f"{0}"}\'', start=(1, 0), end=(1, 11), line='f\'{f"{0}"}\'')
TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')
--------------------------------------------------------------------------------
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string="f'{f'", start=(1, 0), end=(1, 5), line="f'{f'{0}'}'")
TokenInfo(type=55 (OP), string='{', start=(1, 5), end=(1, 6), line="f'{f'{0}'}'")
TokenInfo(type=2 (NUMBER), string='0', start=(1, 6), end=(1, 7), line="f'{f'{0}'}'")
TokenInfo(type=55 (OP), string='}', start=(1, 7), end=(1, 8), line="f'{f'{0}'}'")
TokenInfo(type=3 (STRING), string="'}'", start=(1, 8), end=(1, 11), line="f'{f'{0}'}'")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')

sunmy2019 · 2023-04-12T19:57:18Z

We might need to reach to the maintainer of tokenize for opinions.

sunmy2019 · 2023-04-12T20:18:16Z

One possible solution is to use the c_tokenizer with python#27924

With c_tokenizer,

TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=25 (LBRACE), string='{', start=(1, 2), end=(1, 3), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 3), end=(1, 5), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=62 (FSTRING_MIDDLE), string='{', start=(1, 5), end=(1, 6), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 7), end=(1, 8), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=11 (COLON), string=':', start=(1, 8), end=(1, 9), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=20 (LESS), string='<', start=(1, 10), end=(1, 11), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=2 (NUMBER), string='123', start=(1, 11), end=(1, 14), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=26 (RBRACE), string='}', start=(1, 14), end=(1, 15), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=26 (RBRACE), string='}', start=(1, 15), end=(1, 16), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 16), end=(1, 17), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 17), end=(1, 17), line="f'{f'{{':{<123}}'\n")

lysnikolaou · 2023-04-12T20:23:05Z

@sunmy2019 Thanks for raising this! @mgmacias95 is already workin on rewriting the tokenize module to incorporate the new changes. The latest update was that she'll have a first version by the end of the week.

Grammar/python.gram

Lib/test/test_fstring.py

lysnikolaou

One more typo nit (probably my fault, sorry).

lysnikolaou · 2023-04-12T20:47:00Z

Grammar/python.gram

+    | '{' a='!' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '!'") }
+    | '{' a=':' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before ':'") }
+    | '{' a='}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '}'") }
+    | '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }


Suggested change

| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }

| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expressions are not allowed without parentheses") }

Yeah, but one thing to notice, this won't capture all forbidden usage of lambda, such as f"{1, lambda x:x}"

I got a new idea to capture all possible errors, maybe you can check the following commit?

sunmy2019 · 2023-04-12T21:58:36Z

@lysnikolaou Hi, check my latest commit 91faa31 that handles lamdba properly.

  File "<stdin>", line 1
    f"{lambda x:x}"
       ^^^^^^^^^
SyntaxError: f-string: lambda expressions are not allowed without parentheses

lysnikolaou

Looks great!

sunmy2019 · 2023-04-12T22:10:22Z

Oh I am gonna revert that, since I found some corner cases.

sunmy2019 · 2023-04-12T22:20:43Z

Oh, no need to revert.

I discovered that, this case

f"{1,lambda x:{123}}"

it compiles. It matches the rule of lambdef, since {123} is indeed a valid expression evaluates to a set, and no FSTRING_MIDDLE inside.

It is not relavent to my change, so I don't need to revert.

And with my change, we can inform user when to add parens.

To correct this behavior, we can insert an empty FSTRING_MIDDLE in front of the replacement field.

pablogsal · 2023-04-13T10:05:36Z

Mergin this for the time being as this is a great improvement and is getting quite large already.

Thanks a lot for the fantastic work @sunmy2019 🚀

lysnikolaou · 2023-04-13T10:06:26Z

Does this mean what I think it means? 🎉

lysnikolaou · 2023-04-13T10:06:59Z

We should have a passing test suite now!

pablogsal · 2023-04-13T13:40:07Z

We should have a passing test suite now!

Almost! We have still a failure in the buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

I also pushed a fix for python#102856 (comment)

But we are super super super close!!!

capture more errors in fstring replacement field

040810a

sunmy2019 requested review from pablogsal and lysnikolaou as code owners April 9, 2023 20:34

lysnikolaou reviewed Apr 10, 2023

View reviewed changes

Grammar/python.gram Outdated Show resolved Hide resolved

add some specific rules

2470dfa

sunmy2019 mentioned this pull request Apr 11, 2023

PEP 701 – Syntactic formalization of f-strings python/cpython#102856

Closed

4 tasks

add an entity for empty expression, fix tests

11cd51b

sunmy2019 commented Apr 12, 2023

View reviewed changes

Lib/test/test_fstring.py Show resolved Hide resolved

lysnikolaou reviewed Apr 12, 2023

View reviewed changes

Grammar/python.gram Outdated Show resolved Hide resolved

Lib/test/test_fstring.py Outdated Show resolved Hide resolved

specialize error message for lambda

143524b

lysnikolaou reviewed Apr 12, 2023

View reviewed changes

handle lambda in f-string

91faa31

lysnikolaou approved these changes Apr 12, 2023

View reviewed changes

pablogsal merged commit ef9c43a into pablogsal:fstring-grammar-rebased-after-sprint Apr 13, 2023

sunmy2019 mentioned this pull request Apr 13, 2023

gh-102856: Initial implementation of PEP 701 python/cpython#102855

Merged

sunmy2019 deleted the capture-more-errors-in-fstring-replacement-field branch May 10, 2023 14:00

sunmy2019 mentioned this pull request May 20, 2023

gh-102856: Python tokenizer implementation for PEP 701 python/cpython#104323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capture more errors in fstring replacement field #67

capture more errors in fstring replacement field #67

sunmy2019 commented Apr 9, 2023

sunmy2019 commented Apr 9, 2023

lysnikolaou left a comment

sunmy2019 commented Apr 10, 2023

pablogsal commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 •

edited

Loading

sunmy2019 commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 •

edited

Loading

lysnikolaou commented Apr 12, 2023

lysnikolaou left a comment •

edited

Loading

lysnikolaou Apr 12, 2023

sunmy2019 Apr 12, 2023

sunmy2019 commented Apr 12, 2023 •

edited

Loading

lysnikolaou left a comment

sunmy2019 commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 •

edited

Loading

pablogsal commented Apr 13, 2023

lysnikolaou commented Apr 13, 2023

lysnikolaou commented Apr 13, 2023

pablogsal commented Apr 13, 2023 •

edited

Loading

	\| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }
	\| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expressions are not allowed without parentheses") }

capture more errors in fstring replacement field #67

capture more errors in fstring replacement field #67

Conversation

sunmy2019 commented Apr 9, 2023

sunmy2019 commented Apr 9, 2023

Correct Place

Capture All Kinds of Errors

Friendlier Message

lysnikolaou left a comment

Choose a reason for hiding this comment

sunmy2019 commented Apr 10, 2023

pablogsal commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 • edited Loading

sunmy2019 commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 • edited Loading

lysnikolaou commented Apr 12, 2023

lysnikolaou left a comment • edited Loading

Choose a reason for hiding this comment

lysnikolaou Apr 12, 2023

Choose a reason for hiding this comment

sunmy2019 Apr 12, 2023

Choose a reason for hiding this comment

sunmy2019 commented Apr 12, 2023 • edited Loading

lysnikolaou left a comment

Choose a reason for hiding this comment

sunmy2019 commented Apr 12, 2023

sunmy2019 commented Apr 12, 2023 • edited Loading

pablogsal commented Apr 13, 2023

lysnikolaou commented Apr 13, 2023

lysnikolaou commented Apr 13, 2023

pablogsal commented Apr 13, 2023 • edited Loading

sunmy2019 commented Apr 12, 2023 •

edited

Loading

sunmy2019 commented Apr 12, 2023 •

edited

Loading

lysnikolaou left a comment •

edited

Loading

sunmy2019 commented Apr 12, 2023 •

edited

Loading

sunmy2019 commented Apr 12, 2023 •

edited

Loading

pablogsal commented Apr 13, 2023 •

edited

Loading