-
-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastCodeAt that's actually fast #9458
Comments
Would this also lead to |
I don't think I would deprecate it, just update the documentation. |
I agree we need |
I think for real "fast" , we would need some kind of optimized StringIterator that results in direct access to the string data. Would work well with utf8 for instance. |
Wait we have StringIterator already :) but is it a good enough replacement for fastCodeAt ? |
oh, StringIterator is not unicode compatible ? :'( |
StringIteratorUnicode can be optimized for UTF8 targets by carrying some state, similar to the cursors eval strings have. That is a separate problem though and needs a fast character access function as a basis. |
Actually it's a separate problem entirely because it's more of a character-offset to byte-offset mapping. Such an iterator would likely be based on |
The specification for
StringTools.fastCodeAt
is pretty silly:These two statements contradict each other: If the result is unspecified then we cannot make any guarantees about what you can do with the returned value.
As a consequence, some targets have to branch here, e.g. to avoid throwing exceptions. For instance, Java does this:
This leads to an unnecessary double-branching on implementations that then use
isEof
.At this point we obviously cannot break
fastCodeAt
, but I would like to propose the introduction of anunsafeCodeAt
which really is just the fastest implementation possible, with no out-of-bounds guarantees whatsoever. Consumers can check the bounds themselves by comparing whatever indices they use against the string length.This could then also be utilized by
StringIterator
, becauseIterator.next
also says this:And no, I don't want to "use Bytes instead" because I don't want to deal with unicode myself.
The text was updated successfully, but these errors were encountered: