Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture more errors in fstring replacement field #67

Conversation

sunmy2019
Copy link

This PR aims to capture all possible partial failures in fstring_replacement_field.

It breaks some tests, mostly due to change of the error message.

RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN was introduced to help user to locate the exact failure location.

@sunmy2019
Copy link
Author

this reopens #66.

It has 3 main features:

Correct Place

it outputs the exact place of the first error thanks to the newly introduced RAISE_SYNTAX_ERROR_ON_NEXT_TOKEN
the old one cannot do that:

  File "<stdin>", line 1
    f"{123:{234 +}}"
                 ^
old: SyntaxError: f-string: expecting '}'

  File "<stdin>", line 1
    f"{123:{234 +}}"
                ^
new: SyntaxError: f-string: expecting '=', or '!', or ':', or '}'

This is because the parser needs to look ahead, so RAISE_SYNTAX_ERROR will output on the farthest token seen.
In this example, it points to a } but expecting a }, which is kinda confusing.

Capture All Kinds of Errors

Actually we can prove this rule will capture nearly all possible kinds of failure in fstring_replacement_field.

Friendlier Message

Sometimes you are expecting more than '}', the new one will output more to expect thanks to the rules being enriched.

  File "<string>", line 1
    f'{"s"!r{":10"}}'
            ^
old: SyntaxError: f-string: expecting '}'

  File "<string>", line 1
    f'{"s"!r{":10"}}'
            ^
new: SyntaxError: f-string: expecting ':' or '}'

CC:
@pablogsal @lysnikolaou @isidentical

Copy link
Collaborator

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great! Only on comment we could discuss.

Grammar/python.gram Outdated Show resolved Hide resolved
@sunmy2019
Copy link
Author

I will add a commit to fix the test case once the grammar in this pr is formalized and approved,

@pablogsal
Copy link
Owner

The grammar changes make sense to me, but for this PR to get landed we need to fix the new failing tests for sure

@sunmy2019
Copy link
Author

sunmy2019 commented Apr 12, 2023

During the testing with flag -ucpu, I discovered a new problem:

tokenize is a handwritten tokenizer in pure python, and its definition of tokens is different from what is used in c.

I think these updates of tokens should cause at least a modification of tokenize._tokenize, or even a huge one.

I am wondering if we can utilize what we have in the c module. Since now the f-string supports recursion, making f'{f'{f'{f'{0}'}'}'}' much harder for handwritten tokenizer to keep track of things. And maintaining 2 systems sounds error-prone.

MWE

import tokenize
from io import BytesIO

f1 = BytesIO(b"f'{0}'")
f2 = BytesIO(b"f'{0\n}'")

print(*tokenize.tokenize(f1.readline), sep="\n")
print("-" * 80)
print(*tokenize.tokenize(f2.readline), sep="\n")
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string="f'{0}'", start=(1, 0), end=(1, 6), line="f'{0}'")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 6), end=(1, 7), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')
--------------------------------------------------------------------------------
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
>> TokenInfo(type=1 (NAME), string='f', start=(1, 0), end=(1, 1), line="f'{0\n")
>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(1, 1), end=(1, 2), line="f'{0\n")
>> TokenInfo(type=55 (OP), string='{', start=(1, 2), end=(1, 3), line="f'{0\n")
>> TokenInfo(type=2 (NUMBER), string='0', start=(1, 3), end=(1, 4), line="f'{0\n")
>> TokenInfo(type=66 (NL), string='\n', start=(1, 4), end=(1, 5), line="f'{0\n")
>> TokenInfo(type=55 (OP), string='}', start=(2, 0), end=(2, 1), line="}'")
>> TokenInfo(type=64 (ERRORTOKEN), string="'", start=(2, 1), end=(2, 2), line="}'")
TokenInfo(type=4 (NEWLINE), string='', start=(2, 2), end=(2, 3), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')

Also, a case

f1 = BytesIO(b"f'{f\"{0}\"}'")
f2 = BytesIO(b"f'{f'{0}'}'")

print(*tokenize.tokenize(f1.readline), sep="\n")
print("-" * 80)
print(*tokenize.tokenize(f2.readline), sep="\n")
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string='f\'{f"{0}"}\'', start=(1, 0), end=(1, 11), line='f\'{f"{0}"}\'')
TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')
--------------------------------------------------------------------------------
TokenInfo(type=67 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=3 (STRING), string="f'{f'", start=(1, 0), end=(1, 5), line="f'{f'{0}'}'")
TokenInfo(type=55 (OP), string='{', start=(1, 5), end=(1, 6), line="f'{f'{0}'}'")
TokenInfo(type=2 (NUMBER), string='0', start=(1, 6), end=(1, 7), line="f'{f'{0}'}'")
TokenInfo(type=55 (OP), string='}', start=(1, 7), end=(1, 8), line="f'{f'{0}'}'")
TokenInfo(type=3 (STRING), string="'}'", start=(1, 8), end=(1, 11), line="f'{f'{0}'}'")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 11), end=(1, 12), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')

@sunmy2019
Copy link
Author

We might need to reach to the maintainer of tokenize for opinions.

@sunmy2019
Copy link
Author

sunmy2019 commented Apr 12, 2023

One possible solution is to use the c_tokenizer with python#27924

With c_tokenizer,

TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 0), end=(1, 2), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=25 (LBRACE), string='{', start=(1, 2), end=(1, 3), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=61 (FSTRING_START), string="f'", start=(1, 3), end=(1, 5), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=62 (FSTRING_MIDDLE), string='{', start=(1, 5), end=(1, 6), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 7), end=(1, 8), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=11 (COLON), string=':', start=(1, 8), end=(1, 9), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=25 (LBRACE), string='{', start=(1, 9), end=(1, 10), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=20 (LESS), string='<', start=(1, 10), end=(1, 11), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=2 (NUMBER), string='123', start=(1, 11), end=(1, 14), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=26 (RBRACE), string='}', start=(1, 14), end=(1, 15), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=26 (RBRACE), string='}', start=(1, 15), end=(1, 16), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=63 (FSTRING_END), string="'", start=(1, 16), end=(1, 17), line="f'{f'{{':{<123}}'\n")
TokenInfo(type=4 (NEWLINE), string='', start=(1, 17), end=(1, 17), line="f'{f'{{':{<123}}'\n")

@lysnikolaou
Copy link
Collaborator

@sunmy2019 Thanks for raising this! @mgmacias95 is already workin on rewriting the tokenize module to incorporate the new changes. The latest update was that she'll have a first version by the end of the week.

Grammar/python.gram Outdated Show resolved Hide resolved
Lib/test/test_fstring.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more typo nit (probably my fault, sorry).

| '{' a='!' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '!'") }
| '{' a=':' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before ':'") }
| '{' a='}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: valid expression required before '}'") }
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expression are not allowed without parentheses") }
| '{' a='lambda' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "f-string: lambda expressions are not allowed without parentheses") }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but one thing to notice, this won't capture all forbidden usage of lambda, such as f"{1, lambda x:x}"

I got a new idea to capture all possible errors, maybe you can check the following commit?

@sunmy2019
Copy link
Author

sunmy2019 commented Apr 12, 2023

@lysnikolaou Hi, check my latest commit 91faa31 that handles lamdba properly.

  File "<stdin>", line 1
    f"{lambda x:x}"
       ^^^^^^^^^
SyntaxError: f-string: lambda expressions are not allowed without parentheses

Copy link
Collaborator

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@sunmy2019
Copy link
Author

Oh I am gonna revert that, since I found some corner cases.

@sunmy2019
Copy link
Author

sunmy2019 commented Apr 12, 2023

Oh, no need to revert.

I discovered that, this case

f"{1,lambda x:{123}}"

it compiles. It matches the rule of lambdef, since {123} is indeed a valid expression evaluates to a set, and no FSTRING_MIDDLE inside.

It is not relavent to my change, so I don't need to revert.

And with my change, we can inform user when to add parens.

To correct this behavior, we can insert an empty FSTRING_MIDDLE in front of the replacement field.

@pablogsal pablogsal merged commit ef9c43a into pablogsal:fstring-grammar-rebased-after-sprint Apr 13, 2023
@pablogsal
Copy link
Owner

Mergin this for the time being as this is a great improvement and is getting quite large already.

Thanks a lot for the fantastic work @sunmy2019 🚀

@lysnikolaou
Copy link
Collaborator

Does this mean what I think it means? 🎉

@lysnikolaou
Copy link
Collaborator

We should have a passing test suite now!

@pablogsal
Copy link
Owner

pablogsal commented Apr 13, 2023

We should have a passing test suite now!

Almost! We have still a failure in the buildbots:

https://buildbot.python.org/all/#/builders/802/builds/760

I also pushed a fix for python#102856 (comment)

But we are super super super close!!!

@sunmy2019 sunmy2019 deleted the capture-more-errors-in-fstring-replacement-field branch May 10, 2023 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants