`str::starts_with('x')` (literal char) is slower than `str::starts_with("x")` (literal string). #41993

kennytm · 2017-05-14T17:42:06Z

(Discovered when reviewing #41957)

This piece of code:

pub fn check(a: &str) -> bool {
    a.starts_with('/')
}

generates an extremely complicated assembly. The generated function will try to decode the input string a, obtain the first code point, and then compare with '/' (47).

Compare with

pub fn check(a: &str) -> bool {
    a.starts_with("/")
}

which basically just checks if the first byte of the string is '/' (after an unnecessary (?) is_char_boundary check introduced in #26771 and another unnecessary pointer-equality check)

Most of the time the character as a pattern is a compile-time constant, so it should be more efficient by encoding the char into UTF-8, and then memcmp with the input string.

(Rust uses the "decoding" approach so that char share the same implementation with &[char] and FnMut(char) -> bool via the CharEq trait.)

This problem is particularly bad for Clippy users, since there is a single_char_pattern lint which suggested changing all .starts_with("x") by .starts_with('x') with the assumption that the latter is faster, which as of Rust 1.19 it is the reverse.

$ rustc -vV
rustc 1.19.0-nightly (826d8f385 2017-05-13)
binary: rustc
commit-hash: 826d8f3850b37a23481dfcf4a899b5dfc82d22e3
commit-date: 2017-05-13
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0

x86_64 ASM

Build with:

$ rustc --crate-type=dylib -Copt-level=3 --emit=asm 2.rs
$ cat 2.s

First function (.starts_with('/')):

	.section	__TEXT,__text,regular,pure_instructions
	.globl	__ZN2_25check17hfe0a879e8d0ee56bE
	.p2align	4, 0x90
__ZN2_25check17hfe0a879e8d0ee56bE:
	.cfi_startproc
	pushq	%rbp
Lcfi0:
	.cfi_def_cfa_offset 16
Lcfi1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi2:
	.cfi_def_cfa_register %rbp
	testq	%rsi, %rsi
	je	LBB0_1
	movzbl	(%rdi), %edx
	testb	%dl, %dl
	jns	LBB0_17
	leaq	(%rdi,%rsi), %r8
	xorl	%eax, %eax
	cmpq	$1, %rsi
	movq	%r8, %rsi
	je	LBB0_5
	movzbl	1(%rdi), %eax
	addq	$2, %rdi
	andl	$63, %eax
	movq	%rdi, %rsi
LBB0_5:
	movl	%edx, %ecx
	andl	$31, %ecx
	cmpb	$-32, %dl
	jb	LBB0_6
	cmpq	%r8, %rsi
	je	LBB0_8
	movzbl	(%rsi), %edi
	incq	%rsi
	andl	$63, %edi
	jmp	LBB0_10
LBB0_1:
	xorl	%eax, %eax
	jmp	LBB0_18
LBB0_6:
	shll	$6, %ecx
	jmp	LBB0_16
LBB0_8:
	xorl	%edi, %edi
	movq	%r8, %rsi
LBB0_10:
	shll	$6, %eax
	orl	%edi, %eax
	cmpb	$-16, %dl
	jb	LBB0_11
	cmpq	%r8, %rsi
	je	LBB0_13
	movzbl	(%rsi), %edx
	andl	$63, %edx
	jmp	LBB0_15
LBB0_11:
	shll	$12, %ecx
	jmp	LBB0_16
LBB0_13:
	xorl	%edx, %edx
LBB0_15:
	andl	$7, %ecx
	shll	$18, %ecx
	shll	$6, %eax
	orl	%edx, %eax
LBB0_16:
	orl	%ecx, %eax
	movl	%eax, %edx
LBB0_17:
	cmpl	$47, %edx
	sete	%al
LBB0_18:
	popq	%rbp
	retq
	.cfi_endproc


.subsections_via_symbols

Second function (.starts_with("/")):

	.section	__TEXT,__text,regular,pure_instructions
	.globl	__ZN2_25check17hfe0a879e8d0ee56bE
	.p2align	4, 0x90
__ZN2_25check17hfe0a879e8d0ee56bE:
	.cfi_startproc
	pushq	%rbp
Lcfi0:
	.cfi_def_cfa_offset 16
Lcfi1:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
Lcfi2:
	.cfi_def_cfa_register %rbp
	testq	%rsi, %rsi
	je	LBB0_3
	cmpq	$1, %rsi
	je	LBB0_4
	cmpb	$-65, 1(%rdi)
	jle	LBB0_3
LBB0_4:
	movb	$1, %al
	leaq	_str.0(%rip), %rcx
	cmpq	%rcx, %rdi
	je	LBB0_6
	cmpb	$47, (%rdi)
	je	LBB0_6
LBB0_3:
	xorl	%eax, %eax
LBB0_6:
	popq	%rbp
	retq
	.cfi_endproc

	.section	__TEXT,__const
_str.0:
	.byte	47

.subsections_via_symbols

The text was updated successfully, but these errors were encountered:

nagisa · 2017-05-15T01:56:58Z

Single char cannot be possibly faster than a string, because it will have to encode the character to utf-8 (or decode the string bytes into utf-32, which is what seems to be done here), in general case.

~~It does not seem to me that this issue is actionable in any way. We could change our implementation strategy, but it would only end up redistributing the costs.~~

I guess the ascii_only optimisation could be implemented, actually. The infrastructure is there, only need to adjust the impl of the matcher.

kennytm · 2017-05-15T02:54:39Z

@nagisa The performance of starts_with constant char and constant string should at least be the same. I'm pretty sure the LLVM optimizer is able to encode the character at compile time to reduce the whole is_prefix_of to haystack.as_bytes()[0] == b'/' plus length check.

The problem is not really about the unused function only_ascii, but that the expensive operation (decoding) is done on the haystack which is only known at runtime, rather than (encoding) done on the pattern which can be evaluated at compile time.

Yes there wouldn't be improvement for a runtime char, but it should be a pretty rare case.

nagisa · 2017-05-15T06:19:18Z

So, I’ve “optimised” the char Pattern to the point where doing text.starts_with('k') is as simple as:

_ZN5bench5swith17hb7875e35eb44fb11E:
	.cfi_startproc
	testq	%rsi, %rsi
	je	.LBB0_1
	cmpb	$107, (%rdi)
	sete	%al
	retq
.LBB0_1:
	xorl	%eax, %eax
	retq

The benchmark for 1024 iterations of such call is 1000ns vs 1140ns originally. It also regresses patterns such as text.contains('k') 4 times (or I’m benchmarking it wrong): from 44µs to 175µs, despite not doing the UTF-8 decoding.

Here’s the interesting parts of the code in question:

struct CharSearcherInner<'a> {
    haystack: &'a str,
    needle: [u8; 4],
    needle_len: usize,
    range: ::ops::Range<usize>
}

unsafe impl<'a> Searcher<'a> for CharSearcherInner<'a> {
    fn next(&mut self) -> SearchStep {
        unsafe {
            let haystack = self.haystack.get_unchecked(self.range.clone()).as_bytes();
            let needle = self.needle.get_unchecked(..self.needle_len);
            if haystack.is_empty() {
                SearchStep::Done
            } else {
                let start = self.range.start;
                if haystack.starts_with(needle) {
                    self.range.start += self.needle_len;
                    SearchStep::Match(start, self.range.start)
                } else {
                    let leading_ones = (!haystack[0]).leading_zeros();
                    if leading_ones == 0 {
                        self.range.start += 1;
                    } else {
                        self.range.start += leading_ones as usize;
                    }
                    SearchStep::Reject(start, self.range.start)
                }
            }
        }
    }
}

Feels like it was complete waste of my time to look into this.

kennytm · 2017-05-15T08:45:25Z

@nagisa Thanks for the test! Could you share the benchmarking code? I'll take a look later.

My benchmark looks like this https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae, with a 100% speed up (4.5 ns → 2.2 ns) rather than the bland 1.1 ns → 1.0 ns change. (For sure this really doesn't matter unless you do run .starts_with() million times.)

$ rustc --test -Copt-level=3 -Ctarget-cpu=native 1.rs

$ ./1 --bench

running 5 tests
test bench_baseline                    ... bench:      65,459 ns/iter (+/- 118,871)
test bench_start_with_ascii_as_bytes   ... bench:     224,761 ns/iter (+/- 39,764)
test bench_start_with_ascii_char       ... bench:     450,245 ns/iter (+/- 66,865)
test bench_start_with_ascii_single_str ... bench:     452,953 ns/iter (+/- 61,812)
test bench_start_with_literal_char     ... bench:     221,022 ns/iter (+/- 33,137)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured


$ rustc -vV
rustc 1.19.0-nightly (386b0b9d3 2017-05-14)
binary: rustc
commit-hash: 386b0b9d39274701f30d31ee6ce31c363c6036ea
commit-date: 2017-05-14
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0

(Note: OK this shows str and char actually may have similar performance, although this also shows that both can be improved.)

nagisa · 2017-05-15T09:16:24Z

#[bench]
fn starts_with(b: &mut Bencher) {
    let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
    b.iter(|| {
        for i in 0..1024 {
            black_box(text.starts_with('k'));
        }
    })
}

#[bench]
fn contains(b: &mut Bencher) {
    let text = black_box("kdjsfhlakfhlsghlkvcnljknfqiunvcijqenwodind\0");
    b.iter(|| {
        for i in 0..1024 {
            black_box(text.contains('\0'));
        }
    })
}

is quite literally what I used.

Assembly was generated by compiling

pub fn foo(text: &str) -> bool {
    text.contains('k') // and starts_with, correspondingly
}

nagisa · 2017-05-15T09:18:49Z

master...nagisa:charpat is the diff against libcore

nagisa · 2017-05-16T13:33:21Z

So I tried multiple implementation strategies, and none of them really yielded any substantial benefit for starts_with overall.

In the table below, Two Way is the Searcher based on the same algorithm as the string searcher, Smart Jumping is a greatly improved variant of the idea above, This Patch is a simple patch that does not change the searcher at all, only applies a few small tweaks around, mostly adding inline hints.

All of these do end up in text.starts_with(constant) being optimised to the desired short assembly, even for non-ASCII case. Despite all that, some of the benchmarks run around 2 times as slow, when compared to original code.

I will submit a patch for the other improvements, but I don’t think there’s too much to be done for starts_with(char) case specifically.

This is my benchmark results table:

(<100% → better than original )	Two Way	Smart Jumping	This Patch	Original ns/i	TWS ns/i	SJS ns/i	This Patch ns/i
slice::sort_large_strings	96.16%	94.96%	94.88%	9698174	9325580	9209306	9201359
slice::sort_unstable_large_strings	96.04%	93.07%	91.77%	10236086	9830807	9526951	9393316
str::bench_contains_bad_naive	112.48%	113.03%	102.17%	737	829	833	753
str::bench_contains_equal	75.85%	97.21%	92.88%	323	245	314	300
str::bench_contains_short_long	110.50%	109.92%	102.00%	2399	2651	2637	2447
str::bench_contains_short_short	119.72%	121.13%	102.82%	71	85	86	73
str::bench_join	96.48%	96.09%	95.69%	511	493	491	489
str::char_indicesator	93.44%	104.92%	98.36%	61	57	64	60
str::char_indicesator_rev	72.31%	90.77%	107.69%	65	47	59	70
str::char_iterator	86.44%	103.39%	100.00%	59	51	61	59
str::char_iterator_ascii	72.55%	89.22%	88.24%	204	148	182	180
str::char_iterator_for	80.00%	81.54%	80.00%	65	52	53	52
str::char_iterator_rev	131.37%	119.61%	98.04%	51	67	61	50
str::char_iterator_rev_for	101.89%	98.11%	113.21%	53	54	52	60
str::chars_count::long_lorem_ipsum	69.95%	90.33%	93.88%	2090	1462	1888	1962
str::chars_count::short_ascii	85.94%	100.00%	101.56%	64	55	64	65
str::chars_count::short_mixed	85.92%	90.14%	100.00%	71	61	64	71
str::chars_count::short_pile_of_poo	86.15%	103.08%	100.00%	65	56	67	65
str::contains_bang_char::long_lorem_ipsum	25.00%	25.00%	24.98%	5565	1391	1391	1390
str::contains_bang_char::short_ascii	27.78%	27.78%	27.78%	144	40	40	40
str::contains_bang_char::short_mixed	33.33%	34.17%	34.17%	120	40	41	41
str::contains_bang_char::short_pile_of_poo	57.97%	60.87%	57.97%	69	40	42	40
str::contains_bang_str::long_lorem_ipsum	97.00%	98.04%	99.36%	4836	4691	4741	4805
str::contains_bang_str::short_ascii	103.54%	96.46%	98.99%	198	205	191	196
str::contains_bang_str::short_mixed	104.32%	98.15%	100.62%	162	169	159	163
str::contains_bang_str::short_pile_of_poo	112.09%	116.48%	109.89%	91	102	106	100
str::ends_with_ascii_char::long_lorem_ipsum	106.74%	106.73%	101.67%	5457	5825	5824	5548
str::ends_with_ascii_char::short_ascii	101.72%	101.76%	101.76%	5453	5547	5549	5549
str::ends_with_ascii_char::short_mixed	103.66%	108.82%	103.70%	5352	5548	5824	5550
str::ends_with_ascii_char::short_pile_of_poo	111.86%	101.72%	101.74%	5454	6101	5548	5549
str::ends_with_str::long_lorem_ipsum	88.00%	92.00%	84.02%	6935	6103	6380	5827
str::ends_with_str::short_ascii	88.81%	89.06%	81.79%	7599	6749	6768	6215
str::ends_with_str::short_mixed	40.98%	64.40%	62.75%	2062	845	1328	1294
str::ends_with_str::short_pile_of_poo	40.76%	61.04%	60.51%	2056	838	1255	1244
str::ends_with_unichar::long_lorem_ipsum	91.73%	101.93%	91.73%	5444	4994	5549	4994
str::ends_with_unichar::short_ascii	91.57%	101.72%	91.55%	5455	4995	5549	4994
str::ends_with_unichar::short_mixed	91.35%	101.48%	91.33%	5468	4995	5549	4994
str::ends_with_unichar::short_pile_of_poo	91.53%	101.72%	91.53%	5456	4994	5550	4994
str::find_underscore_char::long_lorem_ipsum	38.06%	31.47%	58.02%	5567	2119	1752	3230
str::find_underscore_char::short_ascii	67.81%	39.73%	63.01%	146	99	58	92
str::find_underscore_char::short_mixed	81.97%	48.36%	68.85%	122	100	59	84
str::find_underscore_char::short_pile_of_poo	372.22%	81.94%	91.67%	72	268	59	66
str::find_underscore_str::long_lorem_ipsum	109.79%	110.74%	99.68%	1899	2085	2103	1893
str::find_underscore_str::short_ascii	113.92%	108.86%	100.00%	79	90	86	79
str::find_underscore_str::short_mixed	117.50%	107.50%	100.00%	80	94	86	80
str::find_underscore_str::short_pile_of_poo	102.78%	103.97%	99.60%	252	259	262	251
str::find_zzz_char::long_lorem_ipsum	17.50%	31.54%	58.02%	5567	974	1756	3230
str::find_zzz_char::short_ascii	48.97%	40.69%	62.76%	145	71	59	91
str::find_zzz_char::short_mixed	51.64%	49.18%	68.03%	122	63	60	83
str::find_zzz_char::short_pile_of_poo	83.10%	281.69%	85.92%	71	59	200	61
str::find_zzz_str::long_lorem_ipsum	99.02%	100.11%	100.76%	921	912	922	928
str::find_zzz_str::short_ascii	115.38%	119.23%	100.00%	52	60	62	52
str::find_zzz_str::short_mixed	113.95%	106.98%	93.02%	43	49	46	40
str::find_zzz_str::short_pile_of_poo	145.00%	115.00%	100.00%	40	58	46	40
str::match_indices_a_str::long_lorem_ipsum	99.64%	100.08%	98.66%	5229	5210	5233	5159
str::match_indices_a_str::short_ascii	92.94%	102.60%	98.51%	269	250	276	265
str::match_indices_a_str::short_mixed	98.03%	108.37%	100.99%	203	199	220	205
str::match_indices_a_str::short_pile_of_poo	103.81%	117.14%	94.29%	105	109	123	99
str::rfind_underscore_char::long_lorem_ipsum	20.16%	86.09%	29.20%	7124	1436	6133	2080
str::rfind_underscore_char::short_ascii	43.59%	98.46%	31.79%	195	85	192	62
str::rfind_underscore_char::short_mixed	51.50%	116.77%	37.72%	167	86	195	63
str::rfind_underscore_char::short_pile_of_poo	256.57%	196.97%	45.45%	99	254	195	45
str::rfind_zzz_char::long_lorem_ipsum	13.35%	86.03%	29.18%	7129	952	6133	2080
str::rfind_zzz_char::short_ascii	28.35%	98.97%	31.96%	194	55	192	62
str::rfind_zzz_char::short_mixed	43.20%	114.79%	36.69%	169	73	194	62
str::rfind_zzz_char::short_pile_of_poo	541.41%	716.16%	45.45%	99	536	709	45
str::rsplitn_space_char::long_lorem_ipsum	216.55%	362.07%	101.38%	145	314	525	147
str::rsplitn_space_char::short_ascii	240.00%	342.40%	101.60%	125	300	428	127
str::rsplitn_space_char::short_mixed	266.13%	384.68%	102.42%	124	330	477	127
str::rsplitn_space_char::short_pile_of_poo	155.07%	315.94%	101.45%	69	107	218	70
str::split_a_str::long_lorem_ipsum	97.46%	100.19%	100.53%	5275	5141	5285	5303
str::split_a_str::short_ascii	91.79%	102.86%	90.00%	280	257	288	252
str::split_a_str::short_mixed	95.37%	103.70%	92.59%	216	206	224	200
str::split_a_str::short_pile_of_poo	103.48%	102.61%	93.04%	115	119	118	107
str::split_ad_str::long_lorem_ipsum	95.38%	99.18%	99.35%	2925	2790	2901	2906
str::split_ad_str::short_ascii	92.96%	100.94%	90.14%	213	198	215	192
str::split_ad_str::short_mixed	100.00%	102.44%	93.50%	123	123	126	115
str::split_ad_str::short_pile_of_poo	101.33%	105.33%	88.00%	75	76	79	66
str::split_ascii	174.80%	143.31%	98.43%	127	222	182	125
str::split_closure	95.97%	97.58%	98.39%	124	119	121	122
str::split_extern_fn	85.65%	95.22%	100.96%	209	179	199	211
str::split_slice	92.26%	101.61%	98.39%	310	286	315	305
str::split_space_char::long_lorem_ipsum	136.49%	155.89%	101.95%	6974	9519	10872	7110
str::split_space_char::short_ascii	198.83%	186.55%	98.83%	171	340	319	169
str::split_space_char::short_mixed	226.12%	173.88%	102.99%	134	303	233	138
str::split_space_char::short_pile_of_poo	140.24%	96.34%	112.20%	82	115	79	92
str::split_space_str::long_lorem_ipsum	94.18%	99.72%	100.74%	8901	8383	8876	8967
str::split_space_str::short_ascii	91.25%	100.94%	91.88%	320	292	323	294
str::split_space_str::short_mixed	92.04%	102.77%	90.31%	289	266	297	261
str::split_space_str::short_pile_of_poo	103.57%	100.00%	97.32%	112	116	112	109
str::split_terminator_space_char::long_lorem_ipsum	135.04%	153.38%	109.07%	7037	9503	10793	7675
str::split_terminator_space_char::short_ascii	196.51%	183.72%	98.26%	172	338	316	169
str::split_terminator_space_char::short_mixed	207.59%	156.55%	94.48%	145	301	227	137
str::split_terminator_space_char::short_pile_of_poo	136.90%	92.86%	108.33%	84	115	78	91
str::split_unicode_ascii	127.21%	89.71%	104.41%	136	173	122	142
str::splitn_space_char::long_lorem_ipsum	141.24%	134.02%	97.94%	194	274	260	190
str::splitn_space_char::short_ascii	188.41%	165.94%	94.93%	138	260	229	131
str::splitn_space_char::short_mixed	220.71%	169.29%	101.43%	140	309	237	142
str::splitn_space_char::short_pile_of_poo	144.05%	98.81%	105.95%	84	121	83	89
str::starts_with_ascii_char::long_lorem_ipsum	235.84%	235.88%	235.88%	2352	5547	5548	5548
str::starts_with_ascii_char::short_ascii	236.04%	236.09%	236.09%	2350	5547	5548	5548
str::starts_with_ascii_char::short_mixed	182.80%	174.11%	174.14%	3186	5824	5547	5548
str::starts_with_ascii_char::short_pile_of_poo	148.91%	148.94%	148.94%	3725	5547	5548	5548
str::starts_with_str::long_lorem_ipsum	77.90%	77.20%	74.37%	7834	6103	6048	5826
str::starts_with_str::short_ascii	78.52%	72.29%	75.51%	8595	6749	6213	6490
str::starts_with_str::short_mixed	80.30%	76.67%	76.67%	7600	6103	5827	5827
str::starts_with_str::short_pile_of_poo	80.04%	90.00%	80.04%	5550	4442	4995	4442
str::starts_with_unichar::long_lorem_ipsum	211.75%	235.33%	211.87%	2358	4993	5549	4996
str::starts_with_unichar::short_ascii	212.87%	236.53%	212.83%	2346	4994	5549	4993
str::starts_with_unichar::short_mixed	164.48%	181.80%	155.87%	3204	5270	5825	4994
str::starts_with_unichar::short_pile_of_poo	134.07%	148.94%	141.48%	3725	4994	5548	5270
str::trim_ascii_char::long_lorem_ipsum	95.30%	100.11%	114.31%	3635	3464	3639	4155
str::trim_ascii_char::short_ascii	90.38%	114.42%	112.50%	104	94	119	117
str::trim_ascii_char::short_mixed	79.10%	100.00%	98.51%	67	53	67	66
str::trim_ascii_char::short_pile_of_poo	116.67%	116.67%	116.67%	12	14	14	14
str::trim_left_ascii_char::long_lorem_ipsum	99.91%	100.00%	100.03%	3464	3461	3464	3465
str::trim_left_ascii_char::short_ascii	97.98%	101.01%	104.04%	99	97	100	103
str::trim_left_ascii_char::short_mixed	100.00%	100.00%	100.00%	7	7	7	7
str::trim_left_ascii_char::short_pile_of_poo	100.00%	100.00%	100.00%	8	8	8	8
str::trim_right_ascii_char::long_lorem_ipsum	133.01%	133.11%	133.06%	2084	2772	2774	2773
str::trim_right_ascii_char::short_ascii	114.49%	118.84%	117.39%	69	79	82	81
str::trim_right_ascii_char::short_mixed	122.92%	129.17%	125.00%	48	59	62	60
str::trim_right_ascii_char::short_pile_of_poo	100.00%	125.00%	125.00%	8	8	10	10
string::bench_exact_size_shrink_to_fit	92.98%	91.23%	91.23%	57	53	52	52
string::bench_from	93.55%	96.77%	91.94%	62	58	60	57
string::bench_from_str	100.00%	95.16%	91.94%	62	62	59	57
string::bench_push_char_one_byte	92.28%	84.85%	92.30%	36152	33360	30674	33368
string::bench_push_char_two_bytes	94.61%	94.14%	92.41%	152449	144227	143511	140883
string::bench_push_str	90.74%	94.44%	88.89%	54	49	51	48
string::bench_push_str_one_byte	79.88%	91.96%	96.21%	149779	119650	137733	144102
string::bench_to_string	95.08%	95.08%	93.44%	61	58	58	57
string::bench_with_capacity	97.14%	100.00%	97.14%	35	34	35	34
string::from_utf8_lossy_100_ascii	93.33%	106.67%	106.67%	15	14	16	16
string::from_utf8_lossy_100_invalid	87.56%	122.76%	103.01%	1560	1366	1915	1607
string::from_utf8_lossy_100_multibyte	97.10%	124.64%	111.59%	69	67	86	77
string::from_utf8_lossy_invalid	105.38%	109.87%	103.14%	223	235	245	230
Average of deltas	112.88%	120.48%	96.33%

kennytm · 2018-07-23T16:12:32Z

Minor update: .starts_with('x') is still slower than .starts_with("x") even with LLVM 7 + RFC 2500. See #52646 (comment) for the microbenchmark

steveklabnik · 2019-10-07T13:21:12Z

Triage: I tried to reproduce the microbenchmark, but:

got "multiple candidates for rand"
the API of rand has changed and I don't know how to update it

Both of these are not insurmountable, but I don't have the time right now to do that work. For simplicity's sake, https://gist.github.com/kennytm/2b6264fdf67651db78cb37095b037fae is the benchmark referenced above.

mati865 · 2019-10-07T13:43:07Z

Assembly of literal char version still looks worse: https://godbolt.org/z/oZb-_I

Mark-Simulacrum · 2019-12-09T21:27:18Z

Yes, that assembly is worse -- it looks like that's mostly down to the chars() iterator of str rather than the direct prefix -- the checkchar function here looks pretty good (and comparable with the string version).

pub fn checkchar(a: &str) -> bool {
    let mut buf = [0; 4];
    a.starts_with(&*'/'.encode_utf8(&mut buf[..]))
}

I think the problem comes down to trying to re-encode the UTF-8 str input to a char isn't something that we can constant fold away, as it's from the unknown input, and LLVM isn't good enough to see that for the '/' char it seems like we can just care about the first byte of the encoded str (which is altogether unsurprising, to be honest, since we are comparing essentially u32 with u32, and LLVM has no knowledge that the first byte is all that matters since it doesn't know the encoding here).

I wonder if we could get some wins by replacing the Pattern impl for char to not use the chars iterator but rather the proposal I gave above (re-encoding the char as UTF-8 and then checking if it's a prefix).

I don't have time to run benchmarks right now but thought I'd leave it here at least.

spunit262 · 2019-12-11T10:48:01Z

since we are comparing essentially u32 with u32, and LLVM has no knowledge that the first byte is all that matters since it doesn't know the encoding here

I think the actual issue is that nothing is telling LLVM that Overlong encodings are forbidden. With them there are 4 different ways to encode '/'. Adding a few unreachable_unchecked to Chars::next might help.

ranma42 · 2019-12-11T13:49:43Z

Since UTF8 has good slicing properties, we could actually compare the .as_bytes() slices whenever we are testing for string vs substring equality (and skip the char boundary check).
This seems to improve the assembly for str.starts_with(str).
encode_utf8 combined with this seems to generate the desired assembly:
https://godbolt.org/z/myTjHj

I will try something along these lines in the core library ASAP

…-char, r=BurntSushi Improve code generated for `starts_with(<literal char>)` This PR includes two minor improvements to the code generated when checking for string prefix/suffix. The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation. The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method. The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string. This PR should fix rust-lang#41993

This enables constant folding when matching a literal char. Fixes rust-lang#41993.

nagisa added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 15, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017

kennytm mentioned this issue Dec 13, 2017

str::find(char) is slower than it ought ot be #46693

Closed

petrochenkov mentioned this issue Jul 23, 2018

Change single char str patterns to chars #52646

Merged

kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 23, 2018

kennytm mentioned this issue Feb 25, 2019

Slow suggestion of single_char_pattern rust-lang/rust-clippy#3813

Closed

ranma42 mentioned this issue Dec 12, 2019

Improve code generated for starts_with(<literal char>) #67249

Merged

bors closed this as completed in 1f6d023 Dec 16, 2019

vivekvpandya pushed a commit to vivekvpandya/rust that referenced this issue Dec 18, 2019

Prefer encoding the char when checking for string prefix/suffix

c59272c

This enables constant folding when matching a literal char. Fixes rust-lang#41993.

petrochenkov mentioned this issue Feb 27, 2020

use char instead of &str for single char patterns #69481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`str::starts_with('x')` (literal char) is slower than `str::starts_with("x")` (literal string). #41993

`str::starts_with('x')` (literal char) is slower than `str::starts_with("x")` (literal string). #41993

kennytm commented May 14, 2017

nagisa commented May 15, 2017 •

edited

Loading

kennytm commented May 15, 2017

nagisa commented May 15, 2017 •

edited

Loading

kennytm commented May 15, 2017 •

edited

Loading

nagisa commented May 15, 2017 •

edited

Loading

nagisa commented May 15, 2017 •

edited

Loading

nagisa commented May 16, 2017 •

edited

Loading

kennytm commented Jul 23, 2018

steveklabnik commented Oct 7, 2019

mati865 commented Oct 7, 2019

Mark-Simulacrum commented Dec 9, 2019

spunit262 commented Dec 11, 2019

ranma42 commented Dec 11, 2019

str::starts_with('x') (literal char) is slower than str::starts_with("x") (literal string). #41993

str::starts_with('x') (literal char) is slower than str::starts_with("x") (literal string). #41993

Comments

kennytm commented May 14, 2017

nagisa commented May 15, 2017 • edited Loading

kennytm commented May 15, 2017

nagisa commented May 15, 2017 • edited Loading

kennytm commented May 15, 2017 • edited Loading

nagisa commented May 15, 2017 • edited Loading

nagisa commented May 15, 2017 • edited Loading

nagisa commented May 16, 2017 • edited Loading

kennytm commented Jul 23, 2018

steveklabnik commented Oct 7, 2019

mati865 commented Oct 7, 2019

Mark-Simulacrum commented Dec 9, 2019

spunit262 commented Dec 11, 2019

ranma42 commented Dec 11, 2019

`str::starts_with('x')` (literal char) is slower than `str::starts_with("x")` (literal string). #41993

`str::starts_with('x')` (literal char) is slower than `str::starts_with("x")` (literal string). #41993

nagisa commented May 15, 2017 •

edited

Loading

nagisa commented May 15, 2017 •

edited

Loading

kennytm commented May 15, 2017 •

edited

Loading

nagisa commented May 15, 2017 •

edited

Loading

nagisa commented May 15, 2017 •

edited

Loading

nagisa commented May 16, 2017 •

edited

Loading