Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str::starts_with('x') (literal char) is slower than str::starts_with("x") (literal string). #41993

Closed
kennytm opened this issue May 14, 2017 · 13 comments · Fixed by #67249
Closed
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@kennytm
Copy link
Member

kennytm commented May 14, 2017

(Discovered when reviewing #41957)

This piece of code:

pub fn check(a: &str) -> bool {
    a.starts_with('/')
}

generates an extremely complicated assembly. The generated function will try to decode the input string a, obtain the first code point, and then compare with '/' (47).

Compare with

pub fn check(a: &str) -> bool {
    a.starts_with("/")
}

which basically just checks if the first byte of the string is '/' (after an unnecessary (?) is_char_boundary check introduced in #26771 and another unnecessary pointer-equality check)


Most of the time the character as a pattern is a compile-time constant, so it should be more efficient by encoding the char into UTF-8, and then memcmp with the input string.

(Rust uses the "decoding" approach so that char share the same implementation with &[char] and FnMut(char) -> bool via the CharEq trait.)

This problem is particularly bad for Clippy users, since there is a single_char_pattern lint which suggested changing all .starts_with("x") by .starts_with('x') with the assumption that the latter is faster, which as of Rust 1.19 it is the reverse.


$ rustc -vV
rustc 1.19.0-nightly (826d8f385 2017-05-13)
binary: rustc
commit-hash: 826d8f3850b37a23481dfcf4a899b5dfc82d22e3
commit-date: 2017-05-13
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0

x86_64 ASM

Build with:

$ rustc --crate-type=dylib -Copt-level=3 --emit=asm 2.rs
$ cat 2.s

First function (.starts_with('/')):

	.section	__TEXT,__text,regular,pure_instructions
	.globl	__ZN2_25check17hfe0a879e8d0ee56bE
	.p2align	4, 0x90
__ZN2_25check17hfe0a879e8d0ee56bE:
	.cfi_startproc
	pushq	%rbp
Lcfi0:
	.cfi_def_cfa_offset 16
Lcfi1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi2:
	.cfi_def_cfa_register %rbp
	testq	%rsi, %rsi
	je	LBB0_1
	movzbl	(%rdi), %edx
	testb	%dl, %dl
	jns	LBB0_17
	leaq	(%rdi,%rsi), %r8
	xorl	%eax, %eax
	cmpq	$1, %rsi
	movq	%r8, %rsi
	je	LBB0_5
	movzbl	1(%rdi), %eax
	addq	$2, %rdi
	andl	$63, %eax
	movq	%rdi, %rsi
LBB0_5:
	movl	%edx, %ecx
	andl	$31, %ecx
	cmpb	$-32, %dl
	jb	LBB0_6
	cmpq	%r8, %rsi
	je	LBB0_8
	movzbl	(%rsi), %edi
	incq	%rsi
	andl	$63, %edi
	jmp	LBB0_10
LBB0_1:
	xorl	%eax, %eax
	jmp	LBB0_18
LBB0_6:
	shll	$6, %ecx
	jmp	LBB0_16
LBB0_8:
	xorl	%edi, %edi
	movq	%r8, %rsi
LBB0_10:
	shll	$6, %eax
	orl	%edi, %eax
	cmpb	$-16, %dl
	jb	LBB0_11
	cmpq	%r8, %rsi
	je	LBB0_13
	movzbl	(%rsi), %edx
	andl	$63, %edx
	jmp	LBB0_15
LBB0_11:
	shll	$12, %ecx
	jmp	LBB0_16
LBB0_13:
	xorl	%edx, %edx
LBB0_15:
	andl	$7, %ecx
	shll	$18, %ecx
	shll	$6, %eax
	orl	%edx, %eax
LBB0_16:
	orl	%ecx, %eax
	movl	%eax, %edx
LBB0_17:
	cmpl	$47, %edx
	sete	%al
LBB0_18:
	popq	%rbp
	retq
	.cfi_endproc


.subsections_via_symbols

Second function (.starts_with("/")):

	.section	__TEXT,__text,regular,pure_instructions
	.globl	__ZN2_25check17hfe0a879e8d0ee56bE
	.p2align	4, 0x90
__ZN2_25check17hfe0a879e8d0ee56bE:
	.cfi_startproc
	pushq	%rbp
Lcfi0:
	.cfi_def_cfa_offset 16
Lcfi1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi2:
	.cfi_def_cfa_register %rbp
	testq	%rsi, %rsi
	je	LBB0_3
	cmpq	$1, %rsi
	je	LBB0_4
	cmpb	$-65, 1(%rdi)
	jle	LBB0_3
LBB0_4:
	movb	$1, %al
	leaq	_str.0(%rip), %rcx
	cmpq	%rcx, %rdi
	je	LBB0_6
	cmpb	$47, (%rdi)
	je	LBB0_6
LBB0_3:
	xorl	%eax, %eax
LBB0_6:
	popq	%rbp
	retq
	.cfi_endproc

	.section	__TEXT,__const
_str.0:
	.byte	47

.subsections_via_symbols
@nagisa
Copy link
Member

nagisa commented May 15, 2017

Single char cannot be possibly faster than a string, because it will have to encode the character to utf-8 (or decode the string bytes into utf-32, which is what seems to be done here), in general case.

It does not seem to me that this issue is actionable in any way. We could change our implementation strategy, but it would only end up redistributing the costs.

I guess the ascii_only optimisation could be implemented, actually. The infrastructure is there, only need to adjust the impl of the matcher.

@nagisa nagisa added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 15, 2017
@kennytm
Copy link
Member Author

kennytm commented May 15, 2017

@nagisa The performance of starts_with constant char and constant string should at least be the same. I'm pretty sure the LLVM optimizer is able to encode the character at compile time to reduce the whole is_prefix_of to haystack.as_bytes()[0] == b'/' plus length check.

The problem is not really about the unused function only_ascii, but that the expensive operation (decoding) is done on the haystack which is only known at runtime, rather than (encoding) done on the pattern which can be evaluated at compile time.

Yes there wouldn't be improvement for a runtime char, but it should be a pretty rare case.

@nagisa
Copy link
Member

nagisa commented May 15, 2017

So, I’ve “optimised” the char Pattern to the point where doing text.starts_with('k') is as simple as:

_ZN5bench5swith17hb7875e35eb44fb11E:
	.cfi_startproc
	testq	%rsi, %rsi
	je	.LBB0_1
	cmpb	$107, (%rdi)
	sete	%al
	retq
.LBB0_1:
	xorl	%eax, %eax
	retq

The benchmark for 1024 iterations of such call is 1000ns vs 1140ns originally. It also regresses patterns such as text.contains('k') 4 times (or I’m benchmarking it wrong): from 44µs to 175µs, despite not doing the UTF-8 decoding.

Here’s the interesting parts of the code in question:

struct CharSearcherInner<'a> {
    haystack: &'a str,
    needle: [u8; 4],
    needle_len: usize,
    range: ::ops::Range<usize>
}

unsafe impl<'a> Searcher<'a> for CharSearcherInner<'a> {
    fn next(&mut self) -> SearchStep {
        unsafe {
            let haystack = self.haystack.get_unchecked(self.range.clone()).as_bytes();
            let needle = self.needle.get_unchecked(..self.needle_len);
            if haystack.is_empty() {
                SearchStep::Done
            } else {
                let start = self.range.start;
                if haystack.starts_with(needle) {
                    self.range.start += self.needle_len;
                    SearchStep::Match(start, self.range.start)
                } else {
                    let leading_ones = (!haystack[0]).leading_zeros();
                    if leading_ones == 0 {
                        self.range.start += 1;
                    } else {
                        self.range.start += leading_ones as usize;
                    }
                    SearchStep::Reject(start, self.range.start)
                }
            }
        }
    }
}

Feels like it was complete waste of my time to look into this.

@kennytm
Copy link
Member Author

kennytm commented May 15, 2017

@nagisa Thanks for the test! Could you share the benchmarking code? I'll take a look later.


My benchmark looks like this https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae, with a 100% speed up (4.5 ns → 2.2 ns) rather than the bland 1.1 ns → 1.0 ns change. (For sure this really doesn't matter unless you do run .starts_with() million times.)

$ rustc --test -Copt-level=3 -Ctarget-cpu=native 1.rs

$ ./1 --bench

running 5 tests
test bench_baseline                    ... bench:      65,459 ns/iter (+/- 118,871)
test bench_start_with_ascii_as_bytes   ... bench:     224,761 ns/iter (+/- 39,764)
test bench_start_with_ascii_char       ... bench:     450,245 ns/iter (+/- 66,865)
test bench_start_with_ascii_single_str ... bench:     452,953 ns/iter (+/- 61,812)
test bench_start_with_literal_char     ... bench:     221,022 ns/iter (+/- 33,137)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured


$ rustc -vV
rustc 1.19.0-nightly (386b0b9d3 2017-05-14)
binary: rustc
commit-hash: 386b0b9d39274701f30d31ee6ce31c363c6036ea
commit-date: 2017-05-14
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0

(Note: OK this shows str and char actually may have similar performance, although this also shows that both can be improved.)

@nagisa
Copy link
Member

nagisa commented May 15, 2017

#[bench]
fn starts_with(b: &mut Bencher) {
    let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
    b.iter(|| {
        for i in 0..1024 {
            black_box(text.starts_with('k'));
        }
    })
}

#[bench]
fn contains(b: &mut Bencher) {
    let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
    b.iter(|| {
        for i in 0..1024 {
            black_box(text.contains('\0'));
        }
    })
}

is quite literally what I used.

Assembly was generated by compiling

pub fn foo(text: &str) -> bool {
    text.contains('k') // and starts_with, correspondingly
}

@nagisa
Copy link
Member

nagisa commented May 15, 2017

master...nagisa:charpat is the diff against libcore

@nagisa
Copy link
Member

nagisa commented May 16, 2017

So I tried multiple implementation strategies, and none of them really yielded any substantial benefit for starts_with overall.

In the table below, Two Way is the Searcher based on the same algorithm as the string searcher, Smart Jumping is a greatly improved variant of the idea above, This Patch is a simple patch that does not change the searcher at all, only applies a few small tweaks around, mostly adding inline hints.

All of these do end up in text.starts_with(constant) being optimised to the desired short assembly, even for non-ASCII case. Despite all that, some of the benchmarks run around 2 times as slow, when compared to original code.

I will submit a patch for the other improvements, but I don’t think there’s too much to be done for starts_with(char) case specifically.

This is my benchmark results table:
(<100% → better than original ) Two Way Smart Jumping This Patch Original ns/i TWS ns/i SJS ns/i This Patch ns/i
slice::sort_large_strings 96.16% 94.96% 94.88% 9698174 9325580 9209306 9201359
slice::sort_unstable_large_strings 96.04% 93.07% 91.77% 10236086 9830807 9526951 9393316
str::bench_contains_bad_naive 112.48% 113.03% 102.17% 737 829 833 753
str::bench_contains_equal 75.85% 97.21% 92.88% 323 245 314 300
str::bench_contains_short_long 110.50% 109.92% 102.00% 2399 2651 2637 2447
str::bench_contains_short_short 119.72% 121.13% 102.82% 71 85 86 73
str::bench_join 96.48% 96.09% 95.69% 511 493 491 489
str::char_indicesator 93.44% 104.92% 98.36% 61 57 64 60
str::char_indicesator_rev 72.31% 90.77% 107.69% 65 47 59 70
str::char_iterator 86.44% 103.39% 100.00% 59 51 61 59
str::char_iterator_ascii 72.55% 89.22% 88.24% 204 148 182 180
str::char_iterator_for 80.00% 81.54% 80.00% 65 52 53 52
str::char_iterator_rev 131.37% 119.61% 98.04% 51 67 61 50
str::char_iterator_rev_for 101.89% 98.11% 113.21% 53 54 52 60
str::chars_count::long_lorem_ipsum 69.95% 90.33% 93.88% 2090 1462 1888 1962
str::chars_count::short_ascii 85.94% 100.00% 101.56% 64 55 64 65
str::chars_count::short_mixed 85.92% 90.14% 100.00% 71 61 64 71
str::chars_count::short_pile_of_poo 86.15% 103.08% 100.00% 65 56 67 65
str::contains_bang_char::long_lorem_ipsum 25.00% 25.00% 24.98% 5565 1391 1391 1390
str::contains_bang_char::short_ascii 27.78% 27.78% 27.78% 144 40 40 40
str::contains_bang_char::short_mixed 33.33% 34.17% 34.17% 120 40 41 41
str::contains_bang_char::short_pile_of_poo 57.97% 60.87% 57.97% 69 40 42 40
str::contains_bang_str::long_lorem_ipsum 97.00% 98.04% 99.36% 4836 4691 4741 4805
str::contains_bang_str::short_ascii 103.54% 96.46% 98.99% 198 205 191 196
str::contains_bang_str::short_mixed 104.32% 98.15% 100.62% 162 169 159 163
str::contains_bang_str::short_pile_of_poo 112.09% 116.48% 109.89% 91 102 106 100
str::ends_with_ascii_char::long_lorem_ipsum 106.74% 106.73% 101.67% 5457 5825 5824 5548
str::ends_with_ascii_char::short_ascii 101.72% 101.76% 101.76% 5453 5547 5549 5549
str::ends_with_ascii_char::short_mixed 103.66% 108.82% 103.70% 5352 5548 5824 5550
str::ends_with_ascii_char::short_pile_of_poo 111.86% 101.72% 101.74% 5454 6101 5548 5549
str::ends_with_str::long_lorem_ipsum 88.00% 92.00% 84.02% 6935 6103 6380 5827
str::ends_with_str::short_ascii 88.81% 89.06% 81.79% 7599 6749 6768 6215
str::ends_with_str::short_mixed 40.98% 64.40% 62.75% 2062 845 1328 1294
str::ends_with_str::short_pile_of_poo 40.76% 61.04% 60.51% 2056 838 1255 1244
str::ends_with_unichar::long_lorem_ipsum 91.73% 101.93% 91.73% 5444 4994 5549 4994
str::ends_with_unichar::short_ascii 91.57% 101.72% 91.55% 5455 4995 5549 4994
str::ends_with_unichar::short_mixed 91.35% 101.48% 91.33% 5468 4995 5549 4994
str::ends_with_unichar::short_pile_of_poo 91.53% 101.72% 91.53% 5456 4994 5550 4994
str::find_underscore_char::long_lorem_ipsum 38.06% 31.47% 58.02% 5567 2119 1752 3230
str::find_underscore_char::short_ascii 67.81% 39.73% 63.01% 146 99 58 92
str::find_underscore_char::short_mixed 81.97% 48.36% 68.85% 122 100 59 84
str::find_underscore_char::short_pile_of_poo 372.22% 81.94% 91.67% 72 268 59 66
str::find_underscore_str::long_lorem_ipsum 109.79% 110.74% 99.68% 1899 2085 2103 1893
str::find_underscore_str::short_ascii 113.92% 108.86% 100.00% 79 90 86 79
str::find_underscore_str::short_mixed 117.50% 107.50% 100.00% 80 94 86 80
str::find_underscore_str::short_pile_of_poo 102.78% 103.97% 99.60% 252 259 262 251
str::find_zzz_char::long_lorem_ipsum 17.50% 31.54% 58.02% 5567 974 1756 3230
str::find_zzz_char::short_ascii 48.97% 40.69% 62.76% 145 71 59 91
str::find_zzz_char::short_mixed 51.64% 49.18% 68.03% 122 63 60 83
str::find_zzz_char::short_pile_of_poo 83.10% 281.69% 85.92% 71 59 200 61
str::find_zzz_str::long_lorem_ipsum 99.02% 100.11% 100.76% 921 912 922 928
str::find_zzz_str::short_ascii 115.38% 119.23% 100.00% 52 60 62 52
str::find_zzz_str::short_mixed 113.95% 106.98% 93.02% 43 49 46 40
str::find_zzz_str::short_pile_of_poo 145.00% 115.00% 100.00% 40 58 46 40
str::match_indices_a_str::long_lorem_ipsum 99.64% 100.08% 98.66% 5229 5210 5233 5159
str::match_indices_a_str::short_ascii 92.94% 102.60% 98.51% 269 250 276 265
str::match_indices_a_str::short_mixed 98.03% 108.37% 100.99% 203 199 220 205
str::match_indices_a_str::short_pile_of_poo 103.81% 117.14% 94.29% 105 109 123 99
str::rfind_underscore_char::long_lorem_ipsum 20.16% 86.09% 29.20% 7124 1436 6133 2080
str::rfind_underscore_char::short_ascii 43.59% 98.46% 31.79% 195 85 192 62
str::rfind_underscore_char::short_mixed 51.50% 116.77% 37.72% 167 86 195 63
str::rfind_underscore_char::short_pile_of_poo 256.57% 196.97% 45.45% 99 254 195 45
str::rfind_zzz_char::long_lorem_ipsum 13.35% 86.03% 29.18% 7129 952 6133 2080
str::rfind_zzz_char::short_ascii 28.35% 98.97% 31.96% 194 55 192 62
str::rfind_zzz_char::short_mixed 43.20% 114.79% 36.69% 169 73 194 62
str::rfind_zzz_char::short_pile_of_poo 541.41% 716.16% 45.45% 99 536 709 45
str::rsplitn_space_char::long_lorem_ipsum 216.55% 362.07% 101.38% 145 314 525 147
str::rsplitn_space_char::short_ascii 240.00% 342.40% 101.60% 125 300 428 127
str::rsplitn_space_char::short_mixed 266.13% 384.68% 102.42% 124 330 477 127
str::rsplitn_space_char::short_pile_of_poo 155.07% 315.94% 101.45% 69 107 218 70
str::split_a_str::long_lorem_ipsum 97.46% 100.19% 100.53% 5275 5141 5285 5303
str::split_a_str::short_ascii 91.79% 102.86% 90.00% 280 257 288 252
str::split_a_str::short_mixed 95.37% 103.70% 92.59% 216 206 224 200
str::split_a_str::short_pile_of_poo 103.48% 102.61% 93.04% 115 119 118 107
str::split_ad_str::long_lorem_ipsum 95.38% 99.18% 99.35% 2925 2790 2901 2906
str::split_ad_str::short_ascii 92.96% 100.94% 90.14% 213 198 215 192
str::split_ad_str::short_mixed 100.00% 102.44% 93.50% 123 123 126 115
str::split_ad_str::short_pile_of_poo 101.33% 105.33% 88.00% 75 76 79 66
str::split_ascii 174.80% 143.31% 98.43% 127 222 182 125
str::split_closure 95.97% 97.58% 98.39% 124 119 121 122
str::split_extern_fn 85.65% 95.22% 100.96% 209 179 199 211
str::split_slice 92.26% 101.61% 98.39% 310 286 315 305
str::split_space_char::long_lorem_ipsum 136.49% 155.89% 101.95% 6974 9519 10872 7110
str::split_space_char::short_ascii 198.83% 186.55% 98.83% 171 340 319 169
str::split_space_char::short_mixed 226.12% 173.88% 102.99% 134 303 233 138
str::split_space_char::short_pile_of_poo 140.24% 96.34% 112.20% 82 115 79 92
str::split_space_str::long_lorem_ipsum 94.18% 99.72% 100.74% 8901 8383 8876 8967
str::split_space_str::short_ascii 91.25% 100.94% 91.88% 320 292 323 294
str::split_space_str::short_mixed 92.04% 102.77% 90.31% 289 266 297 261
str::split_space_str::short_pile_of_poo 103.57% 100.00% 97.32% 112 116 112 109
str::split_terminator_space_char::long_lorem_ipsum 135.04% 153.38% 109.07% 7037 9503 10793 7675
str::split_terminator_space_char::short_ascii 196.51% 183.72% 98.26% 172 338 316 169
str::split_terminator_space_char::short_mixed 207.59% 156.55% 94.48% 145 301 227 137
str::split_terminator_space_char::short_pile_of_poo 136.90% 92.86% 108.33% 84 115 78 91
str::split_unicode_ascii 127.21% 89.71% 104.41% 136 173 122 142
str::splitn_space_char::long_lorem_ipsum 141.24% 134.02% 97.94% 194 274 260 190
str::splitn_space_char::short_ascii 188.41% 165.94% 94.93% 138 260 229 131
str::splitn_space_char::short_mixed 220.71% 169.29% 101.43% 140 309 237 142
str::splitn_space_char::short_pile_of_poo 144.05% 98.81% 105.95% 84 121 83 89
str::starts_with_ascii_char::long_lorem_ipsum 235.84% 235.88% 235.88% 2352 5547 5548 5548
str::starts_with_ascii_char::short_ascii 236.04% 236.09% 236.09% 2350 5547 5548 5548
str::starts_with_ascii_char::short_mixed 182.80% 174.11% 174.14% 3186 5824 5547 5548
str::starts_with_ascii_char::short_pile_of_poo 148.91% 148.94% 148.94% 3725 5547 5548 5548
str::starts_with_str::long_lorem_ipsum 77.90% 77.20% 74.37% 7834 6103 6048 5826
str::starts_with_str::short_ascii 78.52% 72.29% 75.51% 8595 6749 6213 6490
str::starts_with_str::short_mixed 80.30% 76.67% 76.67% 7600 6103 5827 5827
str::starts_with_str::short_pile_of_poo 80.04% 90.00% 80.04% 5550 4442 4995 4442
str::starts_with_unichar::long_lorem_ipsum 211.75% 235.33% 211.87% 2358 4993 5549 4996
str::starts_with_unichar::short_ascii 212.87% 236.53% 212.83% 2346 4994 5549 4993
str::starts_with_unichar::short_mixed 164.48% 181.80% 155.87% 3204 5270 5825 4994
str::starts_with_unichar::short_pile_of_poo 134.07% 148.94% 141.48% 3725 4994 5548 5270
str::trim_ascii_char::long_lorem_ipsum 95.30% 100.11% 114.31% 3635 3464 3639 4155
str::trim_ascii_char::short_ascii 90.38% 114.42% 112.50% 104 94 119 117
str::trim_ascii_char::short_mixed 79.10% 100.00% 98.51% 67 53 67 66
str::trim_ascii_char::short_pile_of_poo 116.67% 116.67% 116.67% 12 14 14 14
str::trim_left_ascii_char::long_lorem_ipsum 99.91% 100.00% 100.03% 3464 3461 3464 3465
str::trim_left_ascii_char::short_ascii 97.98% 101.01% 104.04% 99 97 100 103
str::trim_left_ascii_char::short_mixed 100.00% 100.00% 100.00% 7 7 7 7
str::trim_left_ascii_char::short_pile_of_poo 100.00% 100.00% 100.00% 8 8 8 8
str::trim_right_ascii_char::long_lorem_ipsum 133.01% 133.11% 133.06% 2084 2772 2774 2773
str::trim_right_ascii_char::short_ascii 114.49% 118.84% 117.39% 69 79 82 81
str::trim_right_ascii_char::short_mixed 122.92% 129.17% 125.00% 48 59 62 60
str::trim_right_ascii_char::short_pile_of_poo 100.00% 125.00% 125.00% 8 8 10 10
string::bench_exact_size_shrink_to_fit 92.98% 91.23% 91.23% 57 53 52 52
string::bench_from 93.55% 96.77% 91.94% 62 58 60 57
string::bench_from_str 100.00% 95.16% 91.94% 62 62 59 57
string::bench_push_char_one_byte 92.28% 84.85% 92.30% 36152 33360 30674 33368
string::bench_push_char_two_bytes 94.61% 94.14% 92.41% 152449 144227 143511 140883
string::bench_push_str 90.74% 94.44% 88.89% 54 49 51 48
string::bench_push_str_one_byte 79.88% 91.96% 96.21% 149779 119650 137733 144102
string::bench_to_string 95.08% 95.08% 93.44% 61 58 58 57
string::bench_with_capacity 97.14% 100.00% 97.14% 35 34 35 34
string::from_utf8_lossy_100_ascii 93.33% 106.67% 106.67% 15 14 16 16
string::from_utf8_lossy_100_invalid 87.56% 122.76% 103.01% 1560 1366 1915 1607
string::from_utf8_lossy_100_multibyte 97.10% 124.64% 111.59% 69 67 86 77
string::from_utf8_lossy_invalid 105.38% 109.87% 103.14% 223 235 245 230
Average of deltas 112.88% 120.48% 96.33%



@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@kennytm
Copy link
Member Author

kennytm commented Jul 23, 2018

Minor update: .starts_with('x') is still slower than .starts_with("x") even with LLVM 7 + RFC 2500. See #52646 (comment) for the microbenchmark

@kennytm kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 23, 2018
@steveklabnik
Copy link
Member

Triage: I tried to reproduce the microbenchmark, but:

  1. got "multiple candidates for rand"
  2. the API of rand has changed and I don't know how to update it

Both of these are not insurmountable, but I don't have the time right now to do that work. For simplicity's sake, https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae is the benchmark referenced above.

@mati865
Copy link
Contributor

mati865 commented Oct 7, 2019

Assembly of literal char version still looks worse: https://godbolt.org/z/oZb-_I

@Mark-Simulacrum
Copy link
Member

Yes, that assembly is worse -- it looks like that's mostly down to the chars() iterator of str rather than the direct prefix -- the checkchar function here looks pretty good (and comparable with the string version).

pub fn checkchar(a: &str) -> bool {
    let mut buf = [0; 4];
    a.starts_with(&*'/'.encode_utf8(&mut buf[..]))
}

I think the problem comes down to trying to re-encode the UTF-8 str input to a char isn't something that we can constant fold away, as it's from the unknown input, and LLVM isn't good enough to see that for the '/' char it seems like we can just care about the first byte of the encoded str (which is altogether unsurprising, to be honest, since we are comparing essentially u32 with u32, and LLVM has no knowledge that the first byte is all that matters since it doesn't know the encoding here).

I wonder if we could get some wins by replacing the Pattern impl for char to not use the chars iterator but rather the proposal I gave above (re-encoding the char as UTF-8 and then checking if it's a prefix).

I don't have time to run benchmarks right now but thought I'd leave it here at least.

@spunit262
Copy link
Contributor

since we are comparing essentially u32 with u32, and LLVM has no knowledge that the first byte is all that matters since it doesn't know the encoding here

I think the actual issue is that nothing is telling LLVM that Overlong encodings are forbidden. With them there are 4 different ways to encode '/'. Adding a few unreachable_unchecked to Chars::next might help.

@ranma42
Copy link
Contributor

ranma42 commented Dec 11, 2019

Since UTF8 has good slicing properties, we could actually compare the .as_bytes() slices whenever we are testing for string vs substring equality (and skip the char boundary check).
This seems to improve the assembly for str.starts_with(str).
encode_utf8 combined with this seems to generate the desired assembly:
https://godbolt.org/z/myTjHj

I will try something along these lines in the core library ASAP

Centril added a commit to Centril/rust that referenced this issue Dec 16, 2019
…-char, r=BurntSushi

Improve code generated for `starts_with(<literal char>)`

This PR includes two minor improvements to the code generated when checking for string prefix/suffix.

The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation.

The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method.

The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string.

This PR should fix rust-lang#41993
Centril added a commit to Centril/rust that referenced this issue Dec 16, 2019
…-char, r=BurntSushi

Improve code generated for `starts_with(<literal char>)`

This PR includes two minor improvements to the code generated when checking for string prefix/suffix.

The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation.

The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method.

The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string.

This PR should fix rust-lang#41993
@bors bors closed this as completed in 1f6d023 Dec 16, 2019
vivekvpandya pushed a commit to vivekvpandya/rust that referenced this issue Dec 18, 2019
This enables constant folding when matching a literal char.

Fixes rust-lang#41993.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants