Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining characters not supported #6

Closed
jepler opened this issue May 9, 2017 · 4 comments
Closed

Combining characters not supported #6

jepler opened this issue May 9, 2017 · 4 comments

Comments

@jepler
Copy link

jepler commented May 9, 2017

For instance, the unicode sequence "x\u0300" renders in a standard Unicode terminal using a single character cell, but string-width counts it as 2. (tested in xfce4-terminal and rxvt-unicode on Debian Jessie)

This could be represented as a (non-passing) test:

t.is(m('x\u0300'), 1);
@sindresorhus
Copy link
Owner

sindresorhus commented May 9, 2017

Can you open this on https://github.com/sindresorhus/is-fullwidth-code-point instead? It's the module actually doing the guessing. It uses a list of codepoints from the Unicode spec. So I'm surprised it doesn't work.

@jepler
Copy link
Author

jepler commented May 9, 2017

With respect, I don't think the bug is in is-fullwidth-code-point. "x" is not a full-width code point, and "\u0300" is not a full-width code point, so is-fullwidth-code-point is working as designed when it returns false for those inputs.

Rather, the algorithm in this module is incomplete because it does not have correct (any) treatment for Unicode characters with the category "Mn" (or Mc or Me).

You might want to study https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c for inspiration.

As the author of both the github repos / npm modules in question, I'll leave the decision to you about how to handle it, rather than opening a new issue right now. Thanks for your time.

@sindresorhus sindresorhus reopened this May 9, 2017
sindresorhus added a commit that referenced this issue May 9, 2017
@sindresorhus
Copy link
Owner

Sorry, you're right. I was a bit quick with this one... I've added a failing test.

@jepler
Copy link
Author

jepler commented Jul 18, 2017

@gucong3000 thanks for your work to improve this. @sindresorhus thanks for accepting #12.

While the range you name in your patch is a range of combining characters, I think it does not cover all combining characters in Unicode. For example, consider the sequence of code points “◌᷀” (U+25cc, dotted circle + U+1dc0, combinning dotted grave accent)

This implementation of wcwidth for Unicode has a list of sorted, non-overlapping ranges of combining characters (search for interval combining), as well as a Unix commandline to build such a table from uniset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants