-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode whitespaces #306
Comments
Hi @truchi, thanks for the links! I've been playing with integrating the unicode-linebreak crate, which should take care of all this. It even does fun things such as preventing line breaks in a text like It's not completely done yet since there are some weirdness about soft hyphens: the create finds break points at them, but if the break point is not used, the soft hyphen should be removed. We don't currently support this with the |
Good! I just want to point out that it will not solve the issue of NBSPs at start of line when breaking words. (Are you French Swiss?) |
What kind of issue do you mean? Currently, a NBSP character is not treated in a special way, so it's effectively treated like a However, you're right that a line might be broken at a NBSP if you enable the
No, I'm actually from Denmark, but I moved to Switzerland about 10 years ago 😄 |
The following: let wrap = wrap("Hello\u{00A0}world", Options::new(5).break_words(true));
for line in wrap {
println!("{}", line);
} outputs:
|
Hey @truchi, ah yeah, that's a good example! I guess you would expect the breaks to become
so that the no break space disappears because it happens to fall at the end (or beginning) of the line? |
Hi again! :)
I believe you're pointing out that a non-breaking space remains non-breaking when using the Unicode line breaking algorithm? Indeed, you're completely correct. I implemented support for the Unicode line breaking algorithm in #313 and testing on https://mgeisler.github.io/textwrap/, shows that it makes not difference what kind of word separator I select. However, is this not working as intended? |
Hello ! The project I was working on with textwrap is totally dead now, so I'm unsure how to test that... Glad to see your lib being worked on, you have good motivation! |
Thanks! Let me close this issue now since I hope the Unicode line breaking algorithm from #313 fixes this. |
Hello!
I just want to share my findings with you. You may already know all this... Google gives me this which could have your interest.
Unicode has a few whitespaces that just render as a space in my (gnome) terminal (and that ideographic as double width space). Theses, when
break_words
ing, break the "no space at beginning of line" behavior that you have.Interestingly, there is also "zero width space" to allow breaks (easy to support, you have open issues about this), and "zero width no-break space" to disallow breaks (less easy I guess) (in addition to the famous "no-break space" and the lesser famous "narrow no-break space", whose have width).
char::is_whitespace
links to a database listing other whitespaces and claims it reports them as whitespace but not really...Good luck with that! :)
The text was updated successfully, but these errors were encountered: