Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable LineEnding #454

Merged
merged 11 commits into from
Jul 9, 2022
Merged

Add configurable LineEnding #454

merged 11 commits into from
Jul 9, 2022

Conversation

koiuo
Copy link

@koiuo koiuo commented Jun 4, 2022

This is a quick demonstration of how configurable line ending support would look like #453 .

Immediately I see an issue with unfill which now requires LineEnding parameter, degrading experience for the clients who do not wish to deal with configurable line endings.

Perhaps, we could have two fns: unfill_with_endings(&str, LineEnding) and unfill(&str) { unfill_with_endings(str, LineEnding::LF) }

Alternatives to consider

  • using lines() in unfill (trimming trailing \r\n could be implemented as discarding trailing empty lines)
  • autodetecting the end of the line by examining the input and matching against a set of known patterns

My gut feeling is that explicit control over the behavior is better (unicode defines multiple other options over the well-known CR and LF), but I do not think this is a strong argument considering lines() only recognizes \n and \r\n.

If the overall approach (or any of the alternatives) seems reasonable, I'll work towards finalizing this by providing tests, documentation, etc.. Let me know please.

@koiuo
Copy link
Author

koiuo commented Jun 4, 2022

@mgeisler , please let me know what do you think about the overall approach.
(and just in case, I'm totally ok if after looking at the code you decide that this approach does not fit)
Thanks for your time.

src/lib.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@mgeisler
Copy link
Owner

mgeisler commented Jun 5, 2022

Thanks so much for the PR, it looks really good! I've left a few comments, but overall I think this should work just fine.

@mgeisler
Copy link
Owner

mgeisler commented Jun 5, 2022

Perhaps, we could have two fns: unfill_with_endings(&str, LineEnding) and unfill(&str) { unfill_with_endings(str, LineEnding::LF) }

Alternatives to consider

  • using lines() in unfill (trimming trailing \r\n could be implemented as discarding trailing empty lines)
  • autodetecting the end of the line by examining the input and matching against a set of known patterns

Yeah, thanks for listing this out. My preference would be for auto-detecting the input line ending in unfill: the job of the function is to make sense of a piece of text without anyone telling it how wide the lines are, what the prefix is etc. So detecting the line ending used would consistent with that.

Of course, it could happen that there are no newlines in the input text: if so, then I think it's acceptable that the caller gets an Options back with LineEnding::LF. Basically, such an input string is ambiguous and Textwrap can default to LineEnding::LF.

@mgeisler
Copy link
Owner

mgeisler commented Jun 6, 2022

Hey @edio, in #453 (comment) you mentioned that we could have a LineEnding::Auto variant. I guess that would work like this: if "\r\n" in text { LineEnding::CRLF } else { LineEnding::LF }? That is, it detects if the input has \r\n or \n line endings already and use this to split/join the lines.

That could perhaps be nice! Normally, I would suggest that the callers should be explicit about this: I would expect editors and whatnot to know the encoding of the text they try to wrap. If the encoding is unknown, callers could always detect it using the code above.

Because of this, I think we could defer adding this variant and just do the simple two-variant enum first.

@koiuo
Copy link
Author

koiuo commented Jun 6, 2022

@mgeisler , got you, thanks!

Not arguing, just want to explain my thinking: Rust's lines() is smart and efficient about the CRLF support.
It always splits by \n and immediately after the split it checks whether the last byte of the line is \r, and removes it, if that's the case. However \r\n in text would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.

(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay 🙏 )

@koiuo koiuo force-pushed the master branch 2 times, most recently from 227b0bf to cbc6f1e Compare June 8, 2022 23:44
src/lib.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
@mgeisler
Copy link
Owner

mgeisler commented Jun 9, 2022

Not arguing, just want to explain my thinking:

Thanks for explaining!

Rust's lines() is smart and efficient about the CRLF support. It always splits by \n and immediately after the split it checks whether the last byte of the line is \r, and removes it, if that's the case. However \r\n in text would require full traversal over the input (and, btw, without a guarantee that it'll actually find it, so it is indeed a full scan), and then another full traversal to actually do splits.

Right, that's very clever! I see you implemented equal cleverness now in a very elegant iterator, which is awesome!

(also, ACK to all your comments, will get to the fixes ASAP, but next weekend may be busy for me, so I may get to it only next week. Sorry about the delay 🙏 )

This is open source, there are no deadlines! We all work on this when we have time and energy 😃 Thank you for the great PR!

@koiuo koiuo changed the title Add configurable LineEnding (wip) Add configurable LineEnding Jun 18, 2022
@koiuo koiuo marked this pull request as ready for review June 18, 2022 17:33
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
src/line_ending.rs Outdated Show resolved Hide resolved
@mgeisler
Copy link
Owner

Hey @edio, thanks for all the great updates on the PR.

The build error on Rust 1.56 looks the same as what I had in another project: mgeisler/lipsum#84. I think you just need to be explicit with the type of the slice (for some reason which I don't understand).

@koiuo
Copy link
Author

koiuo commented Jun 20, 2022

@mgeisler , thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.

I believe I adressed all your requests for changes, but please do not hesitate to point to other problems if you can think of any.

And thanks again for your time and help on this PR.

@koiuo koiuo requested a review from mgeisler June 26, 2022 19:58
Copy link
Owner

@mgeisler mgeisler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates!

src/lib.rs Show resolved Hide resolved
@mgeisler
Copy link
Owner

thank you for the fix for the Rust 1.56 compatibility issue. It'd took me ages to figure that out.

Yeah, that was a weird case for me too 😄

src/line_ending.rs Outdated Show resolved Hide resolved
fn refill_convert_crlf() {
assert_eq!(
refill("foo\nbar\n", Options::new(5).line_ending(LineEnding::CRLF)),
"foo\r\nbar\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's possible to make this return "foo\r\nbar\r\n" in an easy way? I'm thinking that the problem might be in

    let line_ending_pat: &[_] = &['\r', '\n'];
    let trimmed = text.trim_end_matches(line_ending_pat);

which makes us eat the line ending (only to append it later without consideration for the detected line neding).

If it's not easy, then don't worry: what we have now is still great. We can fix this corner case later.

src/lib.rs Show resolved Hide resolved
This also improves performance by ~20% in the unfill benchmark
@mgeisler mgeisler merged commit 79bf25b into mgeisler:master Jul 9, 2022
@mgeisler
Copy link
Owner

mgeisler commented Jul 9, 2022

Thanks for all the hard work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants