Add fast path skipping UTF8 length counting#2819
Conversation
|
this seems reasonable, though I should probably re-read more carefully and maybe cook up more corner-cases. I kind of suspect that it won't be as much of a win as the earlier grapheme cluster and utf8 caching patch though? I guess UTF-16 to UTF-8 does cost something through, and this probably does help with the happy path, and we do a lot of these, hrm. |
devinivy
left a comment
There was a problem hiding this comment.
Good thinkin! Re: the factor of 3 in here, I am quite sure that checks out.
9337c2b to
d7cc7d2
Compare
|
Alright, had to rebase to account for stylistic changes in #2817, but let's land this. Re: perf impact, in React Native calling into encoder/decoder goes between JS and native so it's not guaranteed to be super cheap and it would just be nice to not worry about it for the common case. |
Commits
d7cc7d?w=1What
Similar to #2817, I'm trying to avoid calling into
TextEncoder().encode(str).byteLengthfor every string. After this change, I basically don't hit it in the app at all — the fast path always lets me out early.The fast pass itself is pretty general. The idea is that
.lengthcounts UTF-16 code units, and each UTF-16 code unit corresponds to at most 3 bytes in UTF-8 encoding. So we can safely usevalue.length * 3as an upper bound on whatutf8Len(value)could possibly be. If this upper bound is below theminLength, the same is true forutf8Len. If this upper bound is withinmaxLength, the same is true forutf8Len.Why
* 3?So
.length * 3should always give us a valid upper bound. But this needs a look from an expert.I've added some test cases.