-
-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display bugs with emacs and neovim #4094
Comments
There is no bug here. These issues will happen when the program running kitty uses a wcswidth() implementation auto-generated from the latest
replace foo above with the string you want to test. |
@kovidgoyal Can you keep the issue open? I have no expertise on this, I just provided the reproduction guide, but the emacs people also thought it's not 'an emacs bug,' so it seems the bug is a bit complicated, and maybe someone who knows more about both programs will come along and contribute the root of the problem. (I am not saying this is a Kitty bug, but that it being open can attract the needed attention.) PS: Is the second bug also related to getting the wrong width? The corruption there is very extensive, so naively I thought it might be something else. Thanks. |
I'm afraid I dont keep bugs open for things I dont think are bugs, but you or anyone else is welcome to comment further on this bug and I will respond. As for the second issue, that is RTL text, which is totally broken on terminals and terminal applications in general, see #2109 |
@kovidgoyal The second bug is not related to the RTL text. Here is a reproduction using LTR text: See how there are whole lines skipped, and how nvim's modeline is all wrong. |
@kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug. BTW, I am noticing some other nondeterministic bug after playing with these files for a bit, where the whole terminal session will go faulty (any TUI command I run will behave weirdly), and doing |
On Mon, Oct 04, 2021 at 03:00:14AM -0700, batbone wrote:
@kovidgoyal Exactly! But emacs has this exact same bug, and the bug only appears on Kitty, and not Terminal.app, Alacritty, or iTerm. So there is an interaction between something in nvim/emacs and kitty that is causing this bug.
Yes, that file contains a variation selector U+FE0F. The bullets in your
list are U+25AB followed by U+FE0F this combination has width two and is
correctly being rendered in kitty as width 2, you can see that by moving
your cursor over it, it becomes a fat square. So as I said there is no
bug in kitty. What emacs' problem is, only emacs developers can tell
you.
|
Sorry, I should have made two different bug reports, to avoid the confusion here. (If you want, I can open a separate bug report for the second bug now?) I think your comment pertains to the bug with The second bug ( Here is a screenshot of the correct vim: Here is a screenshot of the incorrect neovim: Oh. Trying to take the screenshot of emacs, emacs actually displays But it has the same bug with the old, RTL version All the incorrect behavior reported in this comment is exclusive to Kitty. Here is a screenshot of Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.) |
On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote:
Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)
I have no idea, I am not a vim or emacs developer. Once width
calculations go wrong, everything can go wrong, since the UI is
strings of text of ostensibly known width.
The suspicious thing is the variation selector, and I know lots of
terminals don't support variation selectors. So it's not surprising that
those two bugs cancel out. The terminal not supporting it and the editor
not supporting it. Indeed recently there was a whole conversation with
some nutcase arguing that because many terminals currently dont support
it it should never be supported and I should drop support for it from
kitty.
And as I showed in my screenshot, I get no missing lines, and no UI
corruption in vim (version 8.2.3441) in kitty.
|
@kovidgoyal Can Kitty add an option to disable support for these variation selectors? It's great that you're modernizing terminals, and I agree that the support should be on by default, but having a compatibility mode would help us get through the transitionary period. It can also help us pinpoint the bug, as it might not be the variation selectors after all. |
On Mon, Oct 04, 2021 at 04:07:12AM -0700, batbone wrote:
> On Mon, Oct 04, 2021 at 03:30:59AM -0700, batbone wrote: Do you think this is a problem with getting the width of characters wrong? (Considering it seems more of a height issue.)
> I have no idea, I am not a vim or emacs developer. Once width calculations go wrong, everything can go wrong, since the UI is strings of text of ostensibly known width. The suspicious thing is the variation selector, and I know lots of terminals don't support variation selectors. So it's not surprising that those two bugs cancel out. The terminal not supporting it and the editor not supporting it. Indeed recently there was a whole conversation with some nutcase arguing that because many terminals currently dont support it it should never be supported and I should drop support for it from kitty. And as I showed in my screenshot, I get no missing lines, and no UI corruption in vim (version 8.2.3441) in kitty.
@kovidgoyal Can Kitty add an option to disable support for these variation selectors? It's great that you're modernizing terminals, and I agree that the support should be on by default, but having a compatibility mode would help us get through the transitionary period.
It can also help us pinpoint the bug, as it might not be the variation selectors after all.
It's not worth the effort to me, sorry. As for pinpointing the bug just
strip those chars from the file and see if the bug goes away.
|
Regarding the issue with character width: Emacs uses character width tables computed from the latest Unicode Standard version 14.0.0, using the data in the file EastAsianWidth.txt. In that text, the U+00AD SOFT HYPHEN character, which caused the problems in your file, has the East Asian Width property value of A, which stands for "Ambiguous". The definition of this value in the Unicode Standard Annex 11 (UAX#11) is as follows:
And since the file you show didn't have any East Asian legacy characters, treating SOFT HYPHEN as narrow is IMO correct. |
On Mon, Oct 04, 2021 at 05:39:27AM -0700, Eli-Zaretskii wrote:
Regarding the issue with character width: Emacs uses character width tables computed from the latest Unicode Standard version 14.0.0, using the data in the file EastAsianWidth.txt. In that text, the U+00AD SOFT HYPHEN character, which caused the problems in your file, has the East Asian Width property value of A, which stands for "Ambiguous". The definition of this value in the Unicode Standard Annex 11 (UAX#11) is as follows:
East Asian Ambiguous (A): All characters that can be sometimes wide and sometimes narrow. Ambiguous characters require additional information not contained in the character code to further resolve their width. Ambiguous characters occur in East Asian legacy character sets as wide characters, but as narrow (i.e., normal-width) characters in non-East Asian usage.
And since the file you show didn't have any East Asian legacy characters, treating SOFT HYPHEN as narrow is IMO correct.
A soft hyphen is not rendered at all, unless at a line break
(optionally). So the correct width value for it is zero. Otherwise
you would need to take screen geometry into account when computing
widths, which is undesirable for many reasons. Not to mention that
editors can have margins that the terminal emulator knows nothing about.
So the editor and terminal emulator may not even agree about the line
break locations (think for instance of multiple panes in vim or emacs or even popup windows).
Therefore, the only correct value for soft hyphen width is zero.
And note that EastAsianWidth.txt alone is not sufficient for wcswidth().
It does not cover emoji, variation selectors, zero width joiners, etc.
|
So this is the root cause of the problem in this case, AFAIU: Kitty assumes that the SOFT HYPHEN will not be output in the middle of a line, but Emacs does output it. It has nothing to do with the width tables.
Which is one reason why Emacs doesn't use |
On Mon, Oct 04, 2021 at 06:21:31AM -0700, Eli-Zaretskii wrote:
> A soft hyphen is not rendered at all, unless at a line break
So this is the root cause of the problem in this case, AFAIU: Kitty assumes that the SOFT HYPHEN will not be output in the middle of a line, but Emacs does output it. It has nothing to do with the width tables.
No, kitty does not care about the presence of soft hyphens. It counts
them as zero width and does not render them. Which is the correct
behavior. The problem comes from emacs incorrectly counting them as width 1.
|
I think I found a workaround for the SOFT HYPHEN issue (i.e., (set-char-table-range glyphless-char-display
(char-from-name "SOFT HYPHEN") 'zero-width) We can detect Kitty by checking for the env var (defun kitty-p ()
(let ((kitty-window-id (getenv "KITTY_WINDOW_ID")))
(and kitty-window-id
(not (string= kitty-window-id "")))))
(when (kitty-p)
(set-char-table-range glyphless-char-display
(char-from-name "SOFT HYPHEN") 'zero-width)) So only the issue with the unicode variation selectors remain. |
OK, but the result is the same: Kitty assumes something that is not shared by the editor.
I respectfully disagree. Whether to display zero-width characters is up to the "higher-level protocols", and Emacs traditionally doesn't remove anything from the display. For example, ZWNJ, if it doesn't combine with surrounding text, is displayed as a single-pixel space on GUI displays, and as a regular space on text-mode displays, such as Kitty. So even if SOFT HYPHEN were a zero-width character (which it isn't, see the citation from the Unicode Annex), a terminal should not assume that an editor will share its ideas about text layout. Does Kitty have a setting that can change this behavior? If so, the OP could try using it. If there's no such setting, and the Kitty developers think Kitty behaves correctly I can only conclude that Emacs and Kitty currently cannot work together, and there's no reason to continue this discussion. |
If you don't mind not seeing that character on display and having trouble deleting it, then fine, you can use this solution for your customizations. |
Well, "ugly" is better than "messed-up", I'd say. Don't you agree?
Let's say the issue is different assumptions in Kitty and in Emacs. Specifically, Emacs by default relies on the text-mode terminal to perform the necessary character shaping required by sequences such as this one. |
On Mon, Oct 04, 2021 at 06:43:41AM -0700, Eli-Zaretskii wrote:
> No, kitty does not care about the presence of soft hyphens. It counts them as zero width and does not render them.
OK, but the result is the same: Kitty assumes something that is not shared by the editor.
Indeed, but kitty's assumption is correct, the editor's is not.
> Which is the correct behavior. The problem comes from emacs incorrectly counting them as width 1.
I respectfully disagree. Whether to display zero-width characters is up to the "higher-level protocols", and Emacs traditionally doesn't remove anything from the display. For example, ZWNJ, if it doesn't combine with surrounding text, is displayed as a single-pixel space on GUI displays, and as a regular space on text-mode displays, such as Kitty.
So even if SOFT HYPHEN were a zero-width character (which it isn't, see the citation from the Unicode Annex), a terminal should not assume that an editor will share its ideas about text layout.
Does Kitty have a setting that can change this behavior? If so, the OP could try using it. If there's no such setting, and the Kitty developers think Kitty behaves correctly I can only conclude that Emacs and Kitty currently cannot work together, and there's no reason to continue this discussion.
Thumbs up from me. Fix your incorrect assumptions about soft-hyphen.
Until then there is nothing to be done here.
|
No emacs relies on text mode terminals not doing the necessary character shaping. |
You misunderstood. |
On Mon, Oct 04, 2021 at 07:53:54AM -0700, Eli-Zaretskii wrote:
> No emacs relies on text mode terminals not doing the necessary character shaping.
You misunderstood.
If you say so.
|
Please, this is just a software issue. There is no need to get adversarial over it. There will always be broken assumptions around complex pieces of software interfacing with each other. Mr. Goyal is a bit infamous for their, ahem, inflammatory behavior, but they are just one person maintaining and developing a lot of important, complex pieces of FOSS software. And they respond promptly and to every issue, and even help users on sites like mobileread. If they try to give every issue the consideration that, say, emacs can afford to give its filed issues, I fear it might not be sustainable for them. Being opinionated is somewhat of a necessary evil when the needed manpower is not present. Anyhow, sorry @kovidgoyal if I sound patronizing or anything. Thank you for the work you have done and continue to do. We have found the cause of the first bug, but I think it's still not clear what is causing the unicode variation selector bug. What is different about Kitty and Terminal.app that is tripping up emacs? Why is this not breaking After the breaking point is identified, workarounds can be discussed. |
On Mon, Oct 04, 2021 at 08:27:54AM -0700, batbone wrote:
Please, this is just a software issue. There is no need to get adversarial over it. There will always be broken assumptions around complex pieces of software interfacing with each other.
Mr. Goyal is a bit infamous for their, ahem, inflammatory behavior, but they are just one person maintaining and developing a lot of important, complex pieces of FOSS software. And they respond promptly and to every issue, and even help users on sites like mobileread. If they try to give every issue the consideration that, say, emacs can afford to give its filed issues, I fear it might not be sustainable for them. Being opinionated is somewhat of a necessary evil when the needed manpower is not present.
Anyhow, sorry @kovidgoyal if I sound patronizing or anything. Thank you for the work you have done and continue to do.
No worries.
---
We have found the cause of the first bug, but I think it's still not clear what is causing the unicode variation selector bug. What is different about Kitty and Terminal.app that is tripping up emacs?
The character pair U+25ab and U+FE0F must be rendered in two cells, as
U+FE0F converts U+25ab from *text presentation* to *emoji presentation*. And
emoji in terminals are rendered at width two.
kitty does this, Terminal.app does not. emacs assumes it must be
rendered in one cell.
|
Here's a potentially better workaround:
This will display SOFT HYPHEN as the ASCII dash character |
@Eli-Zaretskii Thanks, that was exactly what I was thinking would be best, but I did not know how to do it. Does emacs 27 do this? I see dashes with emacs 27 without running this code. I think Mr. Goyal is right about emacs being in the wrong on Compare with an emoji that emacs recognizes: |
Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints. |
Sorry, I don't understand: you see dashes for what text? |
On Mon, Oct 04, 2021 at 10:40:14AM -0700, Eli-Zaretskii wrote:
> Emojis indeed usually take a width of two, but emacs somehow thinks the bullet emoji is taking a width of one
Most terminals produce a single-column glyph for these sequences. If Emacs would go with Kitty, it would fail to work on all the other terminals, since there's no way for it to know which terminal does what with each composable sequence of codepoints.
At least AFAIK; if someone knows how to ask the terminal about its behavior in those cases, I'm all ears.
You home the cursor, print out your characters of interest, query the
terminal for its cursor position. That will tell you what width the
terminal thinks the string should be.
And if needed, I am happy to implement a dedicated escape code to query
character widths. My goal with kitty is to move this ecosystem forward.
Supporting Unicode as well as can be done in the paradigm of fixed size
cells we have, is an important part of that goal.
While it is true that many legacy terminals dont support variation
selectors correctly, that is not a reason to do the wrong thing forever.
The fact is that in Unicode, a variation selector can change the nature
of the preceding code point. All other text processing software supports
this, there is no reason terminals should not. Well technically, VS2015
in particular is problematic, because it can reduce the width of the
preceding codepoint, which can cause side effects if the preceding code
point is at a screen boundary, but this is an issue that needs to be
addressed separately by terminal developers to arrive at some standard
for how to behave in this case, and is irrelevant to VS2016, which is
under discussion here.
|
Yes, we changed the behavior in Emacs 28, to avoid interfering with line-wrapping under |
That'd significantly slow down text-mode display, especially if it goes via the network (which is a large portion of use cases where Emacs is used on text terminals). Currently, we just Emacs wants total control on the text layout, leaving the terminal as dumb as possible. For example, if you want correct RTL display in Emacs, you need to disable bidirectional reordering by the terminal. So what you suggest is against the design of Emacs in so many ways I cannot even begin explaining how major a change that would be, even if someone will be willing to pay the price of slower redisplay. We could perhaps allow per-terminal customization of the character-width data, if we want to support terminals that deviate from the Unicode East-Asian width (as in the case of SOFT HYPHEN). But someone will have to provide the data, although for FOSS that just means to look in the sources. And even then, if terminals start having their own ideas of text layout, width data will not be enough... |
On Tue, Oct 05, 2021 at 06:24:56AM -0700, Eli-Zaretskii wrote:
> You home the cursor, print out your characters of interest, query the terminal for its cursor position.
That'd significantly slow down text-mode display, especially if it goes via the network (which is a large portion of use cases where Emacs is used on text terminals). Currently, we just `fwrite` the encoded text to the device, we don't write it one character at a time. So I'd rather we didn't do that.
You do it once, at startup, along with all the other escape codes you use
for detection. So it adds nothing to startup time. Of course it could be
that emacs is doing no terminal feature detection at all, in which case,
I encourage you to start. Pick a set of characters you think will be
different over different terminals and get their widths.
Emacs wants total control on the text layout, leaving the terminal as dumb as possible. For example, if you want correct RTL display in Emacs, you need to disable bidirectional reordering by the terminal. So what you suggest is against the design of Emacs in so many ways I cannot even begin explaining how major a change that would be, even if someone will be willing to pay the price of slower redisplay.
What the width of a character should be is neither dumb nor smart, it
just is. You need to decide what width to use, one value is correct,
another is not. There certainly are some characters where it is not
obvious what the correct answer is, VS2106 and the soft hyphen are not
in that set.
We could perhaps allow per-terminal customization of the character-width data, if we want to support terminals that deviate from the Unicode East-Asian width (as in the case of SOFT HYPHEN). But someone will have to provide the data, although for FOSS that just means to look in the sources. And even then, if terminals start having their own ideas of text layout, width data will not be enough...
Unicode East Asian Width does not determine the width of a soft hyphen.
Soft hyphens have nothing to do with east asian text. Again, you can
choose to use either a wrong width or a correct width or a queried
width. Two of those options are a lot better than the third.
It's fairly insane that we are even discussing the right way to display
a soft hyphen. The answer is obvious. You don't display it. It has zero
width. I've already explained why. In any case I have spent enough time
on this, its your editor, you do what you like with it. I just hope you
make the sensible choice. Good luck.
|
Oh and just for completeness: Here is a FAQ entry from the unicode consortium, that elucidates how soft hyphens must be rendered. https://www.unicode.org/faq/unsup_char.html All default-ignorable characters should be rendered as completely invisible (and non advancing, i.e. "zero width"), if not explicitly supported in rendering. These include: cursive joiners (U+200C ZWNJ, U+200D ZWJ) bidirectional format controls (e.g. U+200E LEFT-TO-RIGHT MARK) the soft hyphen (U+00AD SOFT HYPHEN) word joiners (U+2060 WORD JOINER, also U+FEFF ZWNBSP) the zero width space (U+200B ZERO WIDTH SPACE) invisible math operators (e.g., U+2061 FUNCTION APPLICATION) Jamo filler characters (e.g., U+115F HANGUL CHOSEONG FILLER) variation selectors More technically, all characters with the "Default Ignorable Code Point (DI)" property must be rendered as zero width, non-advancing. |
Describe the bug
Some presumably uncommon characters cause display bugs in emacs and neovim, and this does not happen in most other terminal emulators I have tested.
First bug
Steps to reproduce the behavior:
1.
command kitty --config=/dev/null
2.
curl https://files.lilf.ir/tmp/weird.txt > weird.txt
3.
emacs -Q -nw weird.txt
4. Trying to edit the text in the middle will immediately show you the corruption, but to be precise, go on the visible char
e
innote-taking
, and pressC-x =
to report what char we are on. Instead of getting backe
, we getSPC
!C-x C-c
nvim weird.txt
e
and typeA
. The corruption is obvious:At first, I thought this was an emacs bug, as vim, and previous versions of emacs did not exhibit this behavior. But after extensive discussion on the emacs bug tracker, we think this is probably a terminal emulator issue. I have tested this with Terminal.app, Alacritty, and iTerm, and only iTerm also exhibits this buggy behavior.
Second bug
command kitty --config=/dev/null
curl https://files.lilf.ir/tmp/bug.txt > bug.txt
cat bug.txt
and note the output:emacs -Q -nw bug.txt
#+TITLE: sharif/contact info
is not displayed at all.nvim bug.txt
This bug reproduces with emacs 27, emacs 28, and nvim, on Kitty, and not on iTerm, Alacritty, or Terminal.app.
vim
still works correctly though:Environment details
I have also reproduced the bugs on Ubuntu 20 via SSH, so it happens on both macOS and Linux, at least when Kitty runs from macOS.
I have attached the reproduction files here as well:
The text was updated successfully, but these errors were encountered: