Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text-format] Fix parsing of string literals #730

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cmyr
Copy link
Contributor

@cmyr cmyr commented Jun 17, 2024

This renames next_byte_value to next_str_lit_bytes and changes the signature so that it returns between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

note: I'm not sure how best to add tests for this, and it needs it; in particular there should be a test case of a text-format input that contains a non-ascii string literal. There should probably also be more tests for the weird byte escapes? But definitely a case with non-ascii text.

@stepancheg
Copy link
Owner

Can you please add some test that would fail without this PR?

/// The raw bytes for a single char or escape sequence in a string literal
///
/// The raw bytes are available via an `into_iter` implementation.
pub struct DecodedBytes {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be not used outside of the crate, so it should not be public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the return type of a public method, so it needs to be pub. We could modify that signature to return impl Iterator, if that is preferable?

This renames `next_byte_value` to `next_str_lit_bytes` and may return
between 1..=4 bytes per call, representing the variable-length nature of
the UTF-8 encoding.
@cmyr cmyr force-pushed the parse-unicode-strings branch from 0eaddf2 to 59d6e61 Compare June 26, 2024 16:53
@cmyr
Copy link
Contributor Author

cmyr commented Jun 26, 2024

I've added a test case that fails without this patch but passes with it.

stepancheg added a commit that referenced this pull request Sep 30, 2024
@stepancheg
Copy link
Owner

Merged in bdc1428.

@stepancheg
Copy link
Owner

It should be bumped tomorrow or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

text_format parsing does not correctly handle non-ascii chars?
2 participants