Skip to content

Comments

[pyupgrade] Fix parsing named Unicode escape sequences (UP032)#21901

Merged
ntBre merged 9 commits intoastral-sh:mainfrom
phongddo:phongddo/fstring-escape-n
Dec 16, 2025
Merged

[pyupgrade] Fix parsing named Unicode escape sequences (UP032)#21901
ntBre merged 9 commits intoastral-sh:mainfrom
phongddo:phongddo/fstring-escape-n

Conversation

@phongddo
Copy link
Contributor

@phongddo phongddo commented Dec 10, 2025

Summary

Fixes #19771

Fixes incorrect parsing of Unicode named escape sequences like Hey \N{snowman} in FormatString, which were being incorrectly split into separate literal and field parts instead of being treated as a single literal unit.

Problem

The FormatString parser incorrectly handles Unicode named escape sequences:

  • Current: Hey \N{snowman} is parsed into 2 parts Literal("Hey \N") & Field("snowman")
  • Expected: Hey \N{snowman} should be parsed into 1 part Literal("Hey \N{snowman}")

This affects f-string conversion rules when fixing UP032 that rely on proper format string parsing.

Solution

I modified parse_literal to detect and handle Unicode named escape sequences before parsing single characters:

  • Introduced a flag to track when a backslash is "available" to escape something.
  • When the flag is true, and the text starts with N{, try to parse the complete Unicode escape sequence as one unit, and set the flag to false after parsing successfully.
  • Set the flag to false when the backslash is already consumed.

Manual Verification

"\N{angle}AOB = {angle}°".format(angle=180)

Result

 def foo():
-    "\N{angle}AOB = {angle}°".format(angle=180)
+    f"\N{angle}AOB = {180}°"

Would fix 1 error.

"\N{snowman} {snowman}".format(snowman=1)

Result

 def foo():
-    "\N{snowman} {snowman}".format(snowman=1)
+    f"\N{snowman} {1}"

Would fix 1 error.

"\\N{snowman} {snowman}".format(snowman=1)

Result

 def foo():
-    "\\N{snowman} {snowman}".format(snowman=1)
+    f"\\N{1} {1}"

Would fix 1 error.

Test Plan

  • Added test cases (happy case, invalid case, edge case) for FormatString when parsing Unicode escape sequence.
  • Updated snapshots.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 10, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@phongddo phongddo marked this pull request as ready for review December 10, 2025 18:41
@ntBre ntBre self-requested a review December 10, 2025 20:20
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes sense to me, I just had a couple of small suggestions about the tests.

I took a closer look at this code today, and I'm feeling much less wary than in #19774 (review). Our parser will have already flagged weird invalid cases like these that I brought up last time:

"\N{{angle}}".format(angle="angle")
"\N{LATIN {SMALL} LETTER A}"

So we only need to handle valid cases, which this PR seems to do in a nice way. I also ran the fuzzer for a little while, just in case.

@ntBre ntBre added bug Something isn't working fixes Related to suggested fixes for violations labels Dec 11, 2025
@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 11, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@phongddo phongddo requested a review from ntBre December 11, 2025 00:52
@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 11, 2025

mypy_primer results

Changes were detected when running on open source projects
scikit-build-core (https://github.com/scikit-build/scikit-build-core)
+ src/scikit_build_core/build/wheel.py:98:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 41 diagnostics
+ Found 42 diagnostics

Memory usage changes were detected when running on open source projects
sphinx (https://github.com/sphinx-doc/sphinx)
- WARN expected `heap_size` to be provided by Salsa query `class_based_items`
- WARN expected `heap_size` to be provided by Salsa query `class_based_items`
- WARN expected `heap_size` to be provided by Salsa query `class_based_items`
- WARN expected `heap_size` to be provided by Salsa query `class_based_items`

prefect (https://github.com/PrefectHQ/prefect)
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`
+ WARN expected `heap_size` to be provided by Salsa query `class_based_items`

@phongddo phongddo changed the title Fix Unicode named escape squence parsing in FormatString when fixing UP032 [pyupgrade] Fix Unicode named escape squence parsing in FormatString when fixing UP032 Dec 12, 2025
@phongddo
Copy link
Contributor Author

Hey @ntBre 👋 just in case you missed the updates on this PR. I've addressed your comments about the tests. Please take a look when you have a moment and let me know if there's anything else I should adjust 🙏

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

This looks good to go, just one small test comment update, and I think I preferred your old tests (to avoid adding a dev-dependency to this crate). Sorry for the flip-flop and the delay.

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ntBre ntBre changed the title [pyupgrade] Fix Unicode named escape squence parsing in FormatString when fixing UP032 [pyupgrade] Fix parsing named Unicode escape sequences (UP032) Dec 16, 2025
@ntBre ntBre merged commit b0bc990 into astral-sh:main Dec 16, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fixes Related to suggested fixes for violations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UP032 fix misinterprets \N escape sequence as interpolation

2 participants