Emoji skin modifiers don't work #942

Tyriar · 2017-09-04T17:50:58Z

Pasting "👍🏻 👈🏿"

It looks like the skin modifier is being put into a new cell, when it should actually be added to the single emoji.

Tyriar · 2017-09-04T18:25:23Z

I got it working with the following code:

diff --git a/src/Parser.ts b/src/Parser.ts
index b047f91..0a4aeb9 100644
--- a/src/Parser.ts
+++ b/src/Parser.ts
@@ -187,7 +187,7 @@ export class Parser {
     let cs;
     let ch;
     let code;
-    let low;
+let skipChars = 0;
 
     const cursorStartX = this._terminal.buffer.x;
     const cursorStartY = this._terminal.buffer.y;
@@ -204,6 +204,10 @@ export class Parser {
     }
 
     for (; this._position < l; this._position++) {
+      if (skipChars > 0) {
+        skipChars--;
+        continue;
+      }
       ch = data[this._position];
 
       // FIXME: higher chars than 0xa0 are not allowed in escape sequences
@@ -212,7 +216,7 @@ export class Parser {
       if (0xD800 <= code && code <= 0xDBFF) {
         // we got a surrogate high
         // get surrogate low (next 2 bytes)
-        low = data.charCodeAt(this._position + 1);
+        let low = data.charCodeAt(this._position + 1);
         if (isNaN(low)) {
           // end of data stream, save surrogate high
           this._terminal.surrogate_high = ch;
@@ -220,10 +224,19 @@ export class Parser {
         }
         code = ((code - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;
         ch += data.charAt(this._position + 1);
+        let i = 2;
+        skipChars = 1;
+        while (0xD800 <= data.charCodeAt(this._position + i) && data.charCodeAt(this._position + i) <= 0xDBFF) {
+          ch += data.charAt(this._position + i);
+          ch += data.charAt(this._position + i + 1);
+          i += 2;
+          skipChars++;
+        }
       }
       // surrogate low - already handled above
-      if (0xDC00 <= code && code <= 0xDFFF)
+      if (0xDC00 <= code && code <= 0xDFFF) {
         continue;
+      }
 
       switch (this._state) {
         case ParserState.NORMAL:

However, doing this screws up the prompt as 2 characters (according to the shell) become one, which means you can go left into the $PS1.

We should fix that before going forward with supporting modifiers and the zero-width joiner. Not sure how best to do this, perhaps characters need some more metadata indicating how many characters are joined into the cell, then left/right arrow repeats the action that number of times?

jerch · 2017-09-04T23:59:42Z

@Tyriar Imho this needs a more advanced character evaluation than xterm.js has atm - the current approach can only handle leading combining chars, while this one follows the char to be modified (http://unicode.org/reports/tr51/#Emoji_Modifiers_Table). The info about the correct rendering is in the sequence of chars (+ unicode spec), it is just a nightmare to cover all those edge cases (emoji modifiers were not part of the official spec until 2015).

jerch · 2017-09-05T14:48:00Z

Oh I was wrong with my last statement, the input does not support leading but only following combining characters atm (combining chars with zero width will end up in the last active cell). So the emoji problem should be fixable by adding the modifiers to the wcwidth combining char table COMBINING_HIGH.

jerch · 2017-09-05T15:18:41Z

Ok the problem gets quite nasty - the correct handling of an emoji + modifier in the browser depends on the system font renderer. There is no platform independent way to solve this, seems this feature is to new.

testing results:

Ubuntu 14 prints 2 block symbols with unicode number [][] (no visual glyphs)
Ubuntu 16 prints the glyphs separated (thumbup + colored field behind), it does not combine the characters
OSX 10.12 correctly does the trick

Thats the point where the wcwidth implementation has to reassemble the system wcwidth to get correct widths. Only way I can think of to solve this is to prerender the emoji+modifier offline and measure the text width to see if it actually combines. If it does not combine the Unicode spec suggests to print them both or only the first one. The latter seems reasonable in a terminal environment to preserve the row width.
Not sure, if emojis with modifiers are worth that trouble, what you think?

Tyriar · 2017-09-05T18:24:16Z

Well it would be good to support it, I'm not exactly to keen on jumping on it atm though.

The offscreen measuring thing looks like it would solve the problem. I'm not sure we would want to touch wcwidth though? Isn't that used to determine the width that the character consumes, not the number of characters in a single "cell"?

jerch · 2017-09-05T20:00:35Z

Well wcwidth is part of the deal: currently combining chars end up in the last cell. This is because wcwidth determined it as a char with zero width. With systems treating emoji modifiers differently it does not work that easy anymore:

OSX: modifier combines --> wcwidth should report width of zero
Ubuntu: modifier does not combine --> char has its own visual representation and needs its space, wcwidth should report whatever space it takes to draw it (alternatively drop that char at a higher level before rendering)

If you have a look at the Unicode spec above it gets even trickier with the ZERO WIDTH JOINER (http://unicode.org/reports/tr51/#Emoji_ZWJ_Sequences).

Maybe its time to get the hands dirty with splitting up the stream into graphemes (see https://github.com/devongovett/grapheme-breaker) and feed those to wcwidth. wcwidth then needs a way to dynamically determine the width of unkown graphemes, which could be some offscreen thingy.

NOTE - any implementation in that direction will have a big negative performance impact, since every char has to be inspected at least once (like now with wcwidth) and multiple times in case combining chars are following.

mandel59 · 2018-05-11T10:41:32Z

Relating #1059

mandel59 · 2018-05-11T11:12:40Z

To align to the behavior of remote wcwidth implementations rather than Unicode specification is one of solutions: for instance, mintty renders a keycap emoji in 1 cell and a rainbow flag emoji in 4 cells. Although it is ugly, it solves the probrem about cursor location.

https://github.com/mintty/mintty/wiki/Tips#installing-emoji-resources

jerch · 2018-05-11T15:58:10Z

@mandel59 Well yes and no. Problem behind is that xterm.js can run on a total different system than the backend due to the browser/network stack in between, and both systems involved can differ about the wcwidth and the final render output width (which is even worse because font capabilities dependent). Currently xterm.js's wcwidth is not frontend aware, it simply applies the same rules for all systems. But even being frontend system aware does not solve the problem:

Imagine the following for char XY:

backend: wcwidth 1
frontend: wcwidth 2

Which one to apply now? If we take the 1 of the backend the rendered char is still drawn as wide char due to the way the frontend wants to render it - the cursor ends up on top of or "in the middle" of the glyph. If we take the 2 of the frontend we violate the grid layout of the terminal and will run one cell short at the end of the line, because the pty on the backend side just saw the 1 there.

Imho there is no real solution for this problem. In a perfect world wcwidth is the same for backend and frontend and we dont face this problem at all. For systems where the backend runs on the same system as the frontend (like a local terminal) it can be fixed this way, yes. For remote systems, it is a matter of priority - go with the backend for correct cell alignment with possible over/under draws or go with the frontend for correct draw but broken cell alignment?

Note that most of the unicode chars did not change the wcwidth spec for years and do not create these hassles. The emojies were changed recently from a width of 1 to 2, therefore the problem.

jerch · 2019-11-15T00:52:39Z

Background on this: Those composed characters are part of the grapheme parsing rules in Unicode, same as country flags and others.

We will not see grapheme support anytime soon in xterm.js, as all cmdline programs get them wrong currently. Also note that they are specced to produce different outputs, technically the hand symbol + the box is correct. If the font renderer has a merged symbol for the codepoints it is free to use that prolly shorter symbol. That ambiguity is a nightmare for monospaced grids, but Unicode does not care much in this regard.

@Tyriar Should we close this issue as wont fix?

Tyriar added the type/bug Something is misbehaving label Sep 4, 2017

Tyriar mentioned this issue Sep 4, 2017

Render using canvas #938

Merged

22 tasks

Tyriar added the help wanted label Sep 4, 2017

Tyriar mentioned this issue Sep 11, 2017

Terminal crops wide character microsoft/vscode#34138

Closed

Tyriar mentioned this issue Sep 26, 2017

Emojis can get cut off in the terminal microsoft/vscode#35102

Closed

16patsle mentioned this issue Apr 25, 2018

Emojis using zero-width joiners shown as multiple emojis vercel/hyper#2625

Closed

2 tasks

jerch mentioned this issue May 14, 2018

Support options to treat ambiguous width characters as double width #1453

Closed

jerch mentioned this issue May 23, 2018

revamp wcwidth #1467

Closed

Tyriar added type/enhancement Features or improvements to existing features and removed type/bug Something is misbehaving labels Sep 17, 2018

Tyriar added the area/parser label Oct 7, 2019

Tyriar mentioned this issue Oct 27, 2019

Support ligatures in terminal microsoft/vscode#34103

Closed

Tyriar closed this as completed Nov 23, 2019

Tyriar removed the help wanted label Nov 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emoji skin modifiers don't work #942

Emoji skin modifiers don't work #942

Tyriar commented Sep 4, 2017 •

edited

Loading

Tyriar commented Sep 4, 2017 •

edited

Loading

jerch commented Sep 4, 2017 •

edited

Loading

jerch commented Sep 5, 2017 •

edited

Loading

jerch commented Sep 5, 2017 •

edited

Loading

Tyriar commented Sep 5, 2017

jerch commented Sep 5, 2017 •

edited

Loading

mandel59 commented May 11, 2018

mandel59 commented May 11, 2018

jerch commented May 11, 2018 •

edited

Loading

jerch commented Nov 15, 2019 •

edited

Loading

Emoji skin modifiers don't work #942

Emoji skin modifiers don't work #942

Comments

Tyriar commented Sep 4, 2017 • edited Loading

Tyriar commented Sep 4, 2017 • edited Loading

jerch commented Sep 4, 2017 • edited Loading

jerch commented Sep 5, 2017 • edited Loading

jerch commented Sep 5, 2017 • edited Loading

Tyriar commented Sep 5, 2017

jerch commented Sep 5, 2017 • edited Loading

mandel59 commented May 11, 2018

mandel59 commented May 11, 2018

jerch commented May 11, 2018 • edited Loading

jerch commented Nov 15, 2019 • edited Loading

Tyriar commented Sep 4, 2017 •

edited

Loading

Tyriar commented Sep 4, 2017 •

edited

Loading

jerch commented Sep 4, 2017 •

edited

Loading

jerch commented Sep 5, 2017 •

edited

Loading

jerch commented Sep 5, 2017 •

edited

Loading

jerch commented Sep 5, 2017 •

edited

Loading

jerch commented May 11, 2018 •

edited

Loading

jerch commented Nov 15, 2019 •

edited

Loading