-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.1.13 breaks semver #55
Comments
semver breaking change in 0.1.13 <unicode-rs/unicode-width#55>
This is a behavior change and not considered breaking by this crate. |
That's literally what semver suggests. You broke the implied contract that 0.1.12 should be compatible with 0.1.13 and in doing so broke our tests, which implies that downstream user's apps will be rendered differently. |
"should be compatible with" has an extremely broad range of meanings. I'm careful to make sure that code will compile; but changing the code to have more accurate answers is not a breakage. The Rust stdlib's semver policy (which is roughly what the ecosystem follows as well) talks about behavioral changes as a notion of contract. It really depends on the situation, and I am thoroughly unconvinced that this crate already has or needs such a strong stability contract. This crate says it attempts to approximate the displayed width of characters. That's not a guarantee of stability; and we've (well, @Jules-Bertholet) been trying to improve the accuracy of these heuristics. You are free to rely on such behavior in tests, but you do have to expect that tests will break some time due to this kind of thing. If you want to pin the behavior, pin the dependency. There is an argument to be made for an more stable (not 100% stable: the unicode standard and data this relies on itself changes!) API that uses the east asian width properties only (@Jules-Bertholet: would this be easy to add again? I was somewhat uncomfortable about expanding the scope of the crate, but UAX 11 is pretty vague about this stuff anyway) |
from the link:
A changelog is the exact form of documentation that would have helped notify this change in behavior, but you've rejected adding one in #56 the behavior change here is that the width of `"\u{1}" is now 1 instead of 0. It is however rendered as taking up no space as evidenced by the test output. Working out what the change is caused by and why is next to impossible with the current approach (I have to look at every single commit that was made to work out relevance, and then backtrack to find a related PR). This is not ideal and there are easy ways that help consumers of this library work understand the changes more easily. |
This is highly highly dependent on the exact software stack used and is why this crate is explicitly documented as not necessarily matching actual rendered output. Yes, the documentation could be clearer about how it approximates "displayed width", and why it is an ultimately futile effort to try and accurately predict display width from just codepoints. You've been lucky so far that your tests were dealing with cases that matched up. If you want such an operation, this is not the crate for you, and furthermore, that operation does not exist. (I've been meaning to write documentation on why that is the case for a while, but haven't had the time)
I don't think that's what the RFC means by "documentation". But seriously, this is absolutely not my experience with semver in the Rust community: this kind of behavior change is almost always fine. Of course making a function do something entirely different would be breaking, but changing functions to still continue to do what they are documented as doing. This function still does what it is documented as doing.
That's ... what you have to do with the changelog too? I rejected the changelog because it produces more or less the same output as scrolling through commits. |
semver breaking change in 0.1.13 <unicode-rs/unicode-width#55> <!-- Please read CONTRIBUTING.md before submitting any pull request. -->
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1. Our tests assumed that \u{1} had a width of 0, so this change replaces the \u{1} character with \u{200B} (zero width space) in the tests. Upstream issue (closed as won't fix): unicode-rs/unicode-width#55
SemVer disagrees with the Rust community on what SemVer means.. |
To be clear - I'm using the Rust definition of semver, which treats 0.x.y the same way semver would treat x.y.0 https://doc.rust-lang.org/cargo/reference/resolver.html#semver-compatibility
...
This is the path that we are taking in ratatui/ratatui#1171, but it's a poor one given the blast radius of this sort of breaking change. The changes here are:
The right thing to do is yank and publish 0.2, and document breaking changes like this. But as this issue is closed and we've worked around it for our particular use case, I'll leave pushing this issue forward to any of the 584 other direct dependencies of unicode-width that have actual breaks cause by this. Our reliance on |
Yeah, it changed because the Unicode standard recommends they be rendered unless they are interpretable in that context (which these days is almost never). This crate implements Unicode standards. |
An API that uses the use unicode_width::UnicodeWidthChar;
pub fn dumb_width(s: &str) -> usize {
s.chars().map(|c| c.width().unwrap_or(0)).sum()
}
This is the core of the issue. Terminal emulators assign semantics to the control characters that go beyond what Unicode specifies. This is totally correct and fine (extra semantics introduced by higher-level protocols is what the control characters are for). But it means that if your application interprets control characters according to such a higher-level protocol, you need to do that interpretation and processing before you pass your data to a Unicode library like |
@Jules-Bertholet since it's taken me so long to get around to writing this, would you be interested in adding documentation to the crate and readme that explain how and why this crate can only ever be an approximation, and then reference that in the individual APIs as a caveat when we mention "displayed width". |
Given a lot of people interpret |
We have a pretty long issue that doesn't really have a resolution at ratatui/ratatui#75 about what exactly to do with working out the character widths. It seems at least where the terminal is concerned there's no real solution that solves all the possible issues (and incompatible terminal implementations). As such until someone invests enough time in working through those issues, we're sticking with unicode-width for the purpose of working out where to draw cells (even if it's the wrong thing sometimes). I suspect this will be probably be forever the case. I note there's also https://crates.io/crates/unicode-display-width, but even that has some problems that would take time to investigate to work out whether it's "better" |
yeah, people are working on this upstream at Unicode but progress is slow (https://www.unicode.org/L2/L2023/23194-text-terminal-wg-report.pdf), and it would still not result in an algorithm that solves all the problems since terminals are inconsistent (it would however give terminals better guidance on how to be consistnt).
Sure, but please recognize that if this crate isn't precisely what you're looking for, this crate may do things that diverge from your expectations, and your tests may occasionally break. It is definitely not the intent of this crate to provide some stable displayed width algorithm because such a thing does not exist. |
Also - despite my comment about the solution being a poor one overall, it's definitely the right solution for Ratatui. Our tests should have used u+200B when dealing with zero-width character handling instead of \0 and u+0001, so despite the breakage, it was a breakage that caught an assumption of ours that was incorrect. I can definitely see how that sort of thing makes sense to just release as a point release. In Ratatui's case have done something similar for a rendering change, but in doing so, we communicated the change in our changelog and breaking changes docs, and these flow through to the release notes of the release on GitHub, and from there also to any PRs raised by dependabot in downstream repos. These wouldn't be hidden away in git commits and PR comments like they are here. |
The README already has such a warning, which #53 modifies, but we should probably add something to the Rustdoc yes. And also some notes about terminal emulators |
Here are the important differences:
|
Thanks for documenting those. I think this probably makes unicode-width the right choice at least for now. |
It looks like the changes in v0.1.13 are breaking a number of crates, not only ratatui. The CI we have in Fedora Linux started showing red for some of our Rust packages after we updated unicode-width to version 0.1.13:
There might be more, the CI hasn't yet checked all 3000 Rust packages we have, and quite a large number of them depend on unicode-width (~300). This probably won't change your mind about "this is not a breaking change", but it should at least give some food for thought that the change here might have affected more projects than initially assumed. |
good to know, thanks. In general yes, I maintain that changing behavior in such a way is not a breaking change, and I absolutely expect such tests to break; that's what tests are for. A fair number of packages use this crate for estimating displayed terminal width and if they test this behavior they can expect certain tests to not work as the crate is improved. |
tui is definitely EOL - it's why ratatui exists. That's a pretty big blast radius as it effects every app built with the library (341 reverse deps on crates.io, 480 packages / 11,416 repos on github). Regardless of whether this is a "breaking change", it's a change that breaks things. I'd urge you to please consider relaxing your focus on technical correctness here and focus on the actual impact. |
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1. Our tests assumed that \u{1} had a width of 0, so this change replaces the \u{1} character with \u{200B} (zero width space) in the tests. Upstream issue (closed as won't fix): unicode-rs/unicode-width#55
I found an example where this change definitely results in a bug - the width of the |
To be clear, this is based on the spec - see #45 and read the parts of the unicode spec that are relevant (Those bits are not as dense as some). Unicode 0.1.12 used a canonical display definition of what a control character's width was, unicode 0.1.13 uses the unicode spec display fallback definition and states that this is a compatible non-breaking change with 0.1.12. Regardless of whether this is correct or not, it's a change that significantly breaks code that relies on the previous behavior. More succinctly stated:
Crates should not generally be expected to test the behavior of a dependency like this. The broken tests you're seeing are likely just a small portion of broken behavior. In our case, our tests were wrong(-ish), and have since been fixed. But this means now we don't have any tests that would have picked up the behavior change and so the change is felt by the users of our library. The changes will affect the behavior of every single package that transitively relies on this package. With 300k daily downloads are you really sure that breaking this behavior is a net positive without notifying any of the packages that this changed? |
This misunderstands my position. I think the "impact"; "some people's tests are breaking", is quite acceptable, and even to some extent desired. If crates are testing for this kind of behavior, I want these tests to break, I want people to notice. A broken test does not automatically mean it is the upstream crate's problem, in this case the existence of the test may imply a certain model about this crate that is incorrect, and even if it isn't incorrect it implies the crates care about this in a way that is incompatible with what the crate does. Ironically, when @Jules-Bertholet started making these changes I was somewhat against them because I wanted to stick to the original model that this crate only did East Asian Width. However, it was pretty clear to me over years issues filed on this repo by people buiding terminal stuff, that most users of this crate wanted this to handle more than just East Asian Width. It's clear from ratatui's usage that that's the way y'all use this crate too! So honestly it's baffling to me that such a big deal is being made about what boils down to tests breaking. I am interested in knowing about it, and understanding the problems involved, but I so far have not been convinced of the severity of the issue, and feel that in general these changes serve the terminal crate community better. As I've said before we are definitely lacking documentation on this kind of thing and I've wanted to improve it for a while, a lot of this is good signal for that. But I'm not convinced this is a large or undesirable impact.
I mean, no, I think testing this kind of behavior is typically incorrect, but if you choose to do so, then the tests being broken seems like a fine signal. (typically incorrect, with the caveat that I have definitely in the past found value for tests for behavior like this with the foreknowledge that such behavior may change, where I want to make sure the test broke for a "good reason" not a "bad reason")
yes. and I'm going to state this clearly: at this point you have been quite patronizing, preferring to strongly advocate for specific solutions rather than talk about problems, and repeatedly come off as using shame as a motivator. I have very little desire to deal with this and will not engage if this continues, potentially blocking you. |
So, this is kinda nuanced. The standard says to treat ones that do not have a different representation as width 1. However, I think we should definitely make a deliberate choice as to behavior here for all newlines, and be consistent (and document it). My guess is that they should go back to being 0 but I would like to see some discussion of that first. I'll file an issue. (edit: filed #60) Footnotes
|
If I have offended in any way, please accept my apologies. That was not my intent. In the interest of finding a resolution on this problem, I'm going to bow out and leave it to other interested parties to continue. Thank you for investing your time and energy on this. |
semver breaking change in 0.1.13 <unicode-rs/unicode-width#55> <!-- Please read CONTRIBUTING.md before submitting any pull request. -->
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1. Our tests assumed that \u{1} had a width of 0, so this change replaces the \u{1} character with \u{200B} (zero width space) in the tests. Upstream issue (closed as won't fix): unicode-rs/unicode-width#55
unicode-width's change "Control characters have width 1" between versions 0.1.12 and 0.1.13 changed the logic so that control characters are asumed to have width of 1 rather than 0 width [1]. This breaks skim's rendering pretty badly unfortunately, mainly by not erasing cells that should be erased, so the output looks garbled. Apparently this change broke other parts of the ecosystem too, such as the (actually-maintained, unlike tuikit) ratatui crate. There's some good discussion in [2] about how fundamentally unicode-width is not intended to be "how wide does this draw in a terminal emulator", despite many people using it that way. Other options should be explored longer term (if I decide to meaningfully pick up maintenance of skim), such as the unicode-display-width crated which is mentioned in [2] but still might not be the right algorithm. I think really the proper fix is to strip control characters within skim (or the TUI library) before doing anything to determine the unicode width. But for now, take the easy way out and just pin the unicode-width version to something below 0.1.13 and forget about it until I have more time to put into caring about improving skim. P.S. Change .width_cjk() to .width() in a couple places for consistency. [1] unicode-rs/unicode-width@3063422 [2] unicode-rs/unicode-width#55
v0.1.13 calculates the width of some characters differently, which causes delta to panic See unicode-rs/unicode-width#55 and unicode-rs/unicode-width#66
v0.1.13 calculates the width of some characters differently, which causes delta to panic See unicode-rs/unicode-width#55 and unicode-rs/unicode-width#66
semver breaking change in 0.1.13 <unicode-rs/unicode-width#55> <!-- Please read CONTRIBUTING.md before submitting any pull request. -->
unicode-width 0.1.13 changed the width of \u{1} from 0 to 1. Our tests assumed that \u{1} had a width of 0, so this change replaces the \u{1} character with \u{200B} (zero width space) in the tests. Upstream issue (closed as won't fix): unicode-rs/unicode-width#55
In ratatui, running cargo update from v0.1.12 to v0.1.13 results in the latter failing (something has changed in the way that unicode-width handles zero width strings
Error from our tests
--- STDOUT: ratatui buffer::buffer::tests::set_string_zero_width ---
running 1 test
test buffer::buffer::tests::set_string_zero_width ... FAILED
failures:
failures:
buffer::buffer::tests::set_string_zero_width
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1557 filtered out; finished in 0.00s
--- STDERR: ratatui buffer::buffer::tests::set_string_zero_width ---
thread 'buffer::buffer::tests::set_string_zero_width' panicked at src/buffer/buffer.rs:618:9:
assertion
left == right
failedleft: Buffer {
area: Rect { x: 0, y: 0, width: 1, height: 1 },
content: [
"",
],
styles: [
x: 0, y: 0, fg: Reset, bg: Reset, modifier: NONE,
]
}
right: Buffer {
area: Rect { x: 0, y: 0, width: 1, height: 1 },
content: [
"a",
],
styles: [
x: 0, y: 0, fg: Reset, bg: Reset, modifier: NONE,
]
}
I'd suggest yanking 0.1.13 and publishing the latest release as 0.2.0 instead.
The text was updated successfully, but these errors were encountered: