-
-
Notifications
You must be signed in to change notification settings - Fork 32
Exploring upstreaming "fast-string-width" #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would be happy to improve performance here, but keep in mind that correctness is just as important. I also want to make it clear though that What Node.js version did you use to measure the performance improvements?
|
That's basically what I've said to Jarred, it's probably going to be fast enough in most cases, and for truncation there are better algorithms anyway. However I didn't know that
I can reproduce the slowness of import stringWidth from 'string-width';
import fastStringWidth from 'fast-string-width';
const input = 'a'.repeat ( 25_000 );
console.time ( 'string-width' );
stringWidth ( input );
console.timeEnd ( 'string-width' );
console.time ( 'fast-string-width' );
fastStringWidth ( input );
console.timeEnd ( 'fast-string-width' ); I see this output:
Using a 25k string seems kind of an edge case, but it's to illustrate that there is in fact a problem in V8's
There was an issue already. And as of 3 hours ago it looks like the problem that makes
Sure, do you know of a case where the approach I'm using breaks because I'm not using
Sure 👍
Sure 👍
I'm not sure I'm following. In the example in the readme I'm not even using the returned number for anything, so where are you seeing the flaw? The idea is that for example if you look at this picture CJK characters don't take 2 spaces each in CodeMirror, but 1.666 spaces each, I guess, so if word wrapping were to be enabled in CM, then the editor could potentially use this function with custom width values to understand if a line has to be wrapped or not, while if these numbers are not customizable this function wouldn't be able to tell the editor a number that makes sense for it. Basically if one is under a situation where emojis/CJKs/whatever are rendered in something other than 2 spaces then that number must be customizable or
👍 By the way ![]() It's unclear what that difference will look like in real applications though. To summarize, as I understand it: you prefer the way I'm ok with whatever path you think it's best, I'm just trying to understand if there's something you'd want me to do here, it seems like there may not actually be any particularly interesting change that you'd like me to make in |
The Node.js team sometimes backport bug fixes.
No. I'm using
Correct
Yeah. It's a good exploration, but it seems like the most valuable improvements will come from Node.js itself. |
Closing as there seems to be nothing left to do.
By the way I'd like to point out how for the string in this issue the |
@fabiospampinato Now that |
@joyeecheung I believe that improving the performance of |
Hey 👋 Following bun's announcement of a supposedly 100_000x faster built-in "stringWidth" function I tried to rewrite this function in JS with performance in mind, and I think I got some good results.
I would be interested in trying to upstream the optimizations, so that they can actually reach people, would this be something that you could be interested in also?
If so, I think there are currently the following differences between the implementations that would need to be handled one way or the other:
Intl.Segmenter
is borderline buggy in V8, in the sense that it's way slower than the JSC equivalent, and it gets massively slower and slower the longer the input string is, so I'm not using that and instead I'm stripping out all the combining marks with a regex, assuming that's what\p{M}
actually matches, this stuff seems poorly documented. There may be other subtle differences here, I don't understand exactly whatIntl.Segmenter
does. AlsoIntl.Segmenter
seems unavailable in Firefox at the moment, and by default I prefer if my code could run ~everywhere, so that's another reason for temporarily droppingIntl.Segmenter
I guess.ansi-regex
anyway, the difference in performance wasn't huge.emoji-regex
. The regex I'm using is potentially slightly problematic because it can consider emojis joined by a ZWJ as a single emoji even if they are not supposed to collapse into a single emoji like some family emojis are.get-east-asian-width
, since those don't seem to be exported from a standalone package. I could have potentially calledgetEastAsianWidth
directly, but mainly it just doesn't expose the options that I want.Basically my reimplementation is not immediately mergable, but maybe it could be merged after some changes?
Either way pretty much the entire speedup comes from not using
Intl.Segmenter
and from trying to do as little as possible for common latin/ansi characters, by just matching them with some simple regexes, so maybe it's more practical to just port those two changes.The text was updated successfully, but these errors were encountered: