Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to assume binary is UTF-8 #3172

Merged
merged 1 commit into from
Oct 9, 2024
Merged

Attempt to assume binary is UTF-8 #3172

merged 1 commit into from
Oct 9, 2024

Conversation

kddnewton
Copy link
Collaborator

No description provided.

Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I'm a bit worried there might be some confusion of which encoding should be used in the encoding: binary case with valid UTF-8 bytes.
For example StringNode#value should still give a BINARY string, not a UTF-8 String, and the C parser should not consider anything to be UTF-8.
Basically in that case, I think UTF-8 should only be used for code_unit & codepoint offsets, and otherwise BINARY remains used as before.

@eregon
Copy link
Member

eregon commented Oct 9, 2024

(IMO CRuby should raise for encoding: binary files with non-US-ASCII characters in it, TruffleRuby already does that since there is no way to convert such sources to java.lang.String meaningfully, but it would be a breaking change)

@kddnewton
Copy link
Collaborator Author

I think that's basically what we're going to have. This is only impacting the encoding of the actual Source string itself, not the encoding of the individual strings in the result. They will still match their behavior.

Agreed though that this is very weird and niche.

@kddnewton kddnewton merged commit d6e9b8d into main Oct 9, 2024
54 checks passed
@kddnewton kddnewton deleted the follow-up branch October 9, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants