-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str::starts_with('x')
(literal char) is slower than str::starts_with("x")
(literal string).
#41993
Comments
Single char cannot be possibly faster than a string, because it will have to encode the character to utf-8 (or decode the string bytes into utf-32, which is what seems to be done here), in general case.
I guess the ascii_only optimisation could be implemented, actually. The infrastructure is there, only need to adjust the impl of the matcher. |
@nagisa The performance of starts_with constant char and constant string should at least be the same. I'm pretty sure the LLVM optimizer is able to encode the character at compile time to reduce the whole The problem is not really about the unused function Yes there wouldn't be improvement for a runtime char, but it should be a pretty rare case. |
So, I’ve “optimised” the
The benchmark for 1024 iterations of such call is Here’s the interesting parts of the code in question: struct CharSearcherInner<'a> {
haystack: &'a str,
needle: [u8; 4],
needle_len: usize,
range: ::ops::Range<usize>
}
unsafe impl<'a> Searcher<'a> for CharSearcherInner<'a> {
fn next(&mut self) -> SearchStep {
unsafe {
let haystack = self.haystack.get_unchecked(self.range.clone()).as_bytes();
let needle = self.needle.get_unchecked(..self.needle_len);
if haystack.is_empty() {
SearchStep::Done
} else {
let start = self.range.start;
if haystack.starts_with(needle) {
self.range.start += self.needle_len;
SearchStep::Match(start, self.range.start)
} else {
let leading_ones = (!haystack[0]).leading_zeros();
if leading_ones == 0 {
self.range.start += 1;
} else {
self.range.start += leading_ones as usize;
}
SearchStep::Reject(start, self.range.start)
}
}
}
}
} Feels like it was complete waste of my time to look into this. |
@nagisa Thanks for the test! Could you share the benchmarking code? I'll take a look later. My benchmark looks like this https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae, with a 100% speed up (4.5 ns → 2.2 ns) rather than the bland 1.1 ns → 1.0 ns change. (For sure this really doesn't matter unless you do run $ rustc --test -Copt-level=3 -Ctarget-cpu=native 1.rs
$ ./1 --bench
running 5 tests
test bench_baseline ... bench: 65,459 ns/iter (+/- 118,871)
test bench_start_with_ascii_as_bytes ... bench: 224,761 ns/iter (+/- 39,764)
test bench_start_with_ascii_char ... bench: 450,245 ns/iter (+/- 66,865)
test bench_start_with_ascii_single_str ... bench: 452,953 ns/iter (+/- 61,812)
test bench_start_with_literal_char ... bench: 221,022 ns/iter (+/- 33,137)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
$ rustc -vV
rustc 1.19.0-nightly (386b0b9d3 2017-05-14)
binary: rustc
commit-hash: 386b0b9d39274701f30d31ee6ce31c363c6036ea
commit-date: 2017-05-14
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0
(Note: OK this shows |
#[bench]
fn starts_with(b: &mut Bencher) {
let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
b.iter(|| {
for i in 0..1024 {
black_box(text.starts_with('k'));
}
})
}
#[bench]
fn contains(b: &mut Bencher) {
let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
b.iter(|| {
for i in 0..1024 {
black_box(text.contains('\0'));
}
})
} is quite literally what I used. Assembly was generated by compiling pub fn foo(text: &str) -> bool {
text.contains('k') // and starts_with, correspondingly
} |
master...nagisa:charpat is the diff against libcore |
So I tried multiple implementation strategies, and none of them really yielded any substantial benefit for In the table below, All of these do end up in I will submit a patch for the other improvements, but I don’t think there’s too much to be done for This is my benchmark results table:
|
Minor update: |
Triage: I tried to reproduce the microbenchmark, but:
Both of these are not insurmountable, but I don't have the time right now to do that work. For simplicity's sake, https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae is the benchmark referenced above. |
Assembly of literal char version still looks worse: https://godbolt.org/z/oZb-_I |
Yes, that assembly is worse -- it looks like that's mostly down to the pub fn checkchar(a: &str) -> bool {
let mut buf = [0; 4];
a.starts_with(&*'/'.encode_utf8(&mut buf[..]))
} I think the problem comes down to trying to re-encode the UTF-8 str input to a char isn't something that we can constant fold away, as it's from the unknown input, and LLVM isn't good enough to see that for the I wonder if we could get some wins by replacing the Pattern impl for char to not use the chars iterator but rather the proposal I gave above (re-encoding the char as UTF-8 and then checking if it's a prefix). I don't have time to run benchmarks right now but thought I'd leave it here at least. |
I think the actual issue is that nothing is telling LLVM that Overlong encodings are forbidden. With them there are 4 different ways to encode |
Since UTF8 has good slicing properties, we could actually compare the I will try something along these lines in the core library ASAP |
…-char, r=BurntSushi Improve code generated for `starts_with(<literal char>)` This PR includes two minor improvements to the code generated when checking for string prefix/suffix. The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation. The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method. The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string. This PR should fix rust-lang#41993
…-char, r=BurntSushi Improve code generated for `starts_with(<literal char>)` This PR includes two minor improvements to the code generated when checking for string prefix/suffix. The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation. The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method. The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string. This PR should fix rust-lang#41993
This enables constant folding when matching a literal char. Fixes rust-lang#41993.
(Discovered when reviewing #41957)
This piece of code:
generates an extremely complicated assembly. The generated function will try to decode the input string
a
, obtain the first code point, and then compare with'/'
(47).Compare with
which basically just checks if the first byte of the string is
'/'
(after an unnecessary (?)is_char_boundary
check introduced in #26771 and another unnecessary pointer-equality check)Most of the time the character as a pattern is a compile-time constant, so it should be more efficient by encoding the char into UTF-8, and then
memcmp
with the input string.(Rust uses the "decoding" approach so that
char
share the same implementation with&[char]
andFnMut(char) -> bool
via theCharEq
trait.)This problem is particularly bad for Clippy users, since there is a
single_char_pattern
lint which suggested changing all.starts_with("x")
by.starts_with('x')
with the assumption that the latter is faster, which as of Rust 1.19 it is the reverse.x86_64 ASM
Build with:
First function (
.starts_with('/')
):Second function (
.starts_with("/")
):The text was updated successfully, but these errors were encountered: