[pyupgrade] Fix parsing named Unicode escape sequences (UP032)#21901
[pyupgrade] Fix parsing named Unicode escape sequences (UP032)#21901ntBre merged 9 commits intoastral-sh:mainfrom
pyupgrade] Fix parsing named Unicode escape sequences (UP032)#21901Conversation
|
ntBre
left a comment
There was a problem hiding this comment.
Thanks! This makes sense to me, I just had a couple of small suggestions about the tests.
I took a closer look at this code today, and I'm feeling much less wary than in #19774 (review). Our parser will have already flagged weird invalid cases like these that I brought up last time:
"\N{{angle}}".format(angle="angle")
"\N{LATIN {SMALL} LETTER A}"So we only need to handle valid cases, which this PR seems to do in a nice way. I also ran the fuzzer for a little while, just in case.
...f_linter/src/rules/pyupgrade/snapshots/ruff_linter__rules__pyupgrade__tests__UP032_0.py.snap
Show resolved
Hide resolved
Diagnostic diff on typing conformance testsNo changes detected when running ty on typing conformance tests ✅ |
|
FormatString when fixing UP032pyupgrade] Fix Unicode named escape squence parsing in FormatString when fixing UP032
|
Hey @ntBre 👋 just in case you missed the updates on this PR. I've addressed your comments about the tests. Please take a look when you have a moment and let me know if there's anything else I should adjust 🙏 |
ntBre
left a comment
There was a problem hiding this comment.
Thank you!
This looks good to go, just one small test comment update, and I think I preferred your old tests (to avoid adding a dev-dependency to this crate). Sorry for the flip-flop and the delay.
...f_linter/src/rules/pyupgrade/snapshots/ruff_linter__rules__pyupgrade__tests__UP032_0.py.snap
Outdated
Show resolved
Hide resolved
pyupgrade] Fix Unicode named escape squence parsing in FormatString when fixing UP032pyupgrade] Fix parsing named Unicode escape sequences (UP032)
Summary
Fixes #19771
Fixes incorrect parsing of Unicode named escape sequences like
Hey \N{snowman}inFormatString, which were being incorrectly split into separate literal and field parts instead of being treated as a single literal unit.Problem
The
FormatStringparser incorrectly handles Unicode named escape sequences:Hey \N{snowman}is parsed into 2 partsLiteral("Hey \N")&Field("snowman")Hey \N{snowman}should be parsed into 1 partLiteral("Hey \N{snowman}")This affects f-string conversion rules when fixing
UP032that rely on proper format string parsing.Solution
I modified
parse_literalto detect and handle Unicode named escape sequences before parsing single characters:true, and the text starts withN{, try to parse the complete Unicode escape sequence as one unit, and set the flag tofalseafter parsing successfully.falsewhen the backslash is already consumed.Manual Verification
"\N{angle}AOB = {angle}°".format(angle=180)Result
"\N{snowman} {snowman}".format(snowman=1)Result
"\\N{snowman} {snowman}".format(snowman=1)Result
Test Plan
FormatStringwhen parsing Unicode escape sequence.