Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid EncodingWarning. Fixes #3695. #3696

Merged
merged 1 commit into from
May 19, 2023

Conversation

jaraco
Copy link
Contributor

@jaraco jaraco commented May 19, 2023

Fixes #3695

@JelleZijlstra JelleZijlstra added the skip news Pull requests that don't need a changelog entry. label May 19, 2023
@github-actions
Copy link

diff-shades reports zero changes comparing this PR (1c6d8c7) to main (2fd9d8b).


What is this? | Workflow run | diff-shades documentation

Copy link
Collaborator

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, as we assume source we parse is UTF8 compatible ...

@JelleZijlstra
Copy link
Collaborator

I don't think we can assume Python source is UTF-8. However, seems like this is parsing the grammar files for blib2to3, which should be in UTF-8. I want to check the code some more, though.

@jaraco
Copy link
Contributor Author

jaraco commented May 19, 2023

According to PEP 597 (motivation), UTF-8 is the default encoding on non-Windows systems, so by specifying it, you provide clarity and consistent behavior across platforms. You could specify encoding="locale", but I'd only recommend that if compatibility is essential and the variance is desirable. In my experience, it's almost always the case that using UTF-8 as the specified encoding provides the best experience and rarely leads to surprise. You should ask yourselves, do you expect that users have specifically put non-ascii non-UTF-8 characters in the grammar files and expect those to work only in that locale. If not, you probably want UTF-8. If so, you may want to migrate users away from that assumption.

@JelleZijlstra
Copy link
Collaborator

Users shouldn't put anything in the grammar files, we control all the grammar files and we don't support users providing their own. My hesitation is only about verifying that this codepath is purely about opening the internal grammar files, not user code.

Copy link
Collaborator

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only call this with our own files Grammar.txt and PatternGrammar.txt, which are ASCII.

@JelleZijlstra JelleZijlstra merged commit eedfc38 into psf:main May 19, 2023
@JelleZijlstra
Copy link
Collaborator

Thanks @jaraco!

jsh9 added a commit to jsh9/cercis that referenced this pull request May 21, 2023
commit eedfc38
Author: Jason R. Coombs <[email protected]>
Date:   Fri May 19 13:00:29 2023 -0400

    Avoid EncodingWarning in blib2to3 (psf#3696)

commit 2fd9d8b
Author: Jonathan Berthias <[email protected]>
Date:   Fri May 19 01:57:17 2023 +0200

    Remove blank lines before class docstring (psf#3692)

commit db3668a
Author: Ray Bell <[email protected]>
Date:   Tue May 16 22:47:45 2023 -0400

    Sort DEFAULT_EXCLUDES and add .vscode, .pytest_cache and .ruff_cache (psf#3691)

    Co-authored-by: Ray Bell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news Pull requests that don't need a changelog entry.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EncodingWarning in pgen
3 participants