Skip to content

Optimize String#index(Char) and #rindex(Char) for invalid UTF-8#14461

Merged
straight-shoota merged 5 commits intocrystal-lang:masterfrom
HertzDevil:perf/string-rindex-char
Apr 17, 2024
Merged

Optimize String#index(Char) and #rindex(Char) for invalid UTF-8#14461
straight-shoota merged 5 commits intocrystal-lang:masterfrom
HertzDevil:perf/string-rindex-char

Conversation

@HertzDevil
Copy link
Contributor

If str.single_byte_optimizable? is true, then str contains only ASCII characters and invalid UTF-8 byte sequences, therefore we can immediately reject any non-ASCII codepoint.

HertzDevil and others added 2 commits April 9, 2024 15:40
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
@straight-shoota straight-shoota added this to the 1.13.0 milestone Apr 9, 2024
@straight-shoota straight-shoota merged commit 4dd5814 into crystal-lang:master Apr 17, 2024
@HertzDevil HertzDevil deleted the perf/string-rindex-char branch April 18, 2024 06:21
straight-shoota pushed a commit that referenced this pull request Aug 25, 2024
If the string consists only of ASCII characters and invalid UTF-8 byte sequences, all the latter should correspond to `Char::REPLACEMENT`, and so `#index` and `#rindex` should detect them, but this was broken since #14461.
straight-shoota pushed a commit that referenced this pull request Sep 17, 2024
If the string consists only of ASCII characters and invalid UTF-8 byte sequences, all the latter should correspond to `Char::REPLACEMENT`, and so `#index` and `#rindex` should detect them, but this was broken since #14461.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants