Use libc::memchr instead of iter().rposition(), closes #30076 #30151

fhahn · 2015-12-01T22:02:17Z

This is an initial PR for #30076 that uses libc::memchr instead of iter().rposition(). In the general case memchr should be much faster. rposition is only faster in the case when an \n is near the end of a long string (because memchr searches from left to right).

On Linux there is a memrchr which searches right to left, but it seems like it's not available on other platforms. I think it would also make sense to add some benchmarks, but I am not sure what the right place would be (and how to run them then).

rust-highfive · 2015-12-01T22:02:30Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

bluss · 2015-12-01T22:45:30Z

src/libstd/io/buffered.rs

+                buf.as_ptr() as *const libc::c_void,
+                b'\n' as libc::c_int,
+                buf.len() as libc::size_t)
+        };


Best to have a safe wrapper function for memchr — and then the structure of fn write doesn't need to change at all either.

bluss · 2015-12-01T22:49:03Z

Thanks. As exploration I've tested exactly this method, and it increased throughput of binary data to LineWriter buffered stdout hugely on my machine, (from 0.9 GB/s to 1.6 GB/s in my test), and with it, my regular rust reimplementation of cat behaved the same as the original, even when I was using he LineWriter buffered stdout.

So the solution is certainly good on a platform with a fast memchr (glibc/linux in my case).

Libc on windows does not provide such a nice version. So I think ideally should have memrchr (which does not exist on windows at all), but we can provide a reasonably fast fallback.

I wanted to ask you @alexcrichton, can we add such a custom memrchr to libstd for now? The code for a memrchr fallback already exists, in memchr written by yours truly.

bluss · 2015-12-01T22:51:43Z

@alexcrichton And a trickier question, in libcore we'd like to have both memchr and memrchr available for byte, char and substring searching, and it would dramatically speed them up. Is it possible to have an optional libc dependency? Having just the rust versions of those functions would still be a massive performance win, but the glibc version is yet much faster.

This is just more context for placement ^

Maybe we can have the rust versions in libcore, and have libstd use either-or depending on what's available.

alexcrichton · 2015-12-01T23:46:01Z

Thanks for the PR @fhahn! This looks good to me, although the semantics have changed due to searching from the front of the string instead of the end (which may not flush as much as it needs to).

@bluss I'd prefer to not pick up a libc dep in core, but having different versions in core/std seems fine. This has been an open libs question for quite awhile now and I'd rather not export this as a public API as a result, but having it as an implementation detail is fine.

Along those lines I don't mind bringing in memrchr stuff to get it to work. Ideally we'd just copy whatever glibc does as I suspect that LLVM is at least capable of generating such code.

bluss · 2015-12-02T00:02:22Z

They are using an SSE2 version in asm for x86 platforms, so we can't really copy that.

bluss · 2015-12-02T00:05:14Z

I think we'd need to have them in public (unstable) API in libcore, so that we can have the pure rust implementations there (autovectorized to some decent degree, as noted), and still use them as fallback in libstd.

fhahn · 2015-12-02T11:42:59Z

There is already a crate by @BurntSushi , which provides wrappers and fallbacks for memchr and memrchr and looks pretty good.

I also created a benchmark to compare the following things:

iter().rposition()
libc's memrchr
@BurntSushi's Rust imeplementation of memrchr (bench_rust_memrchr1)
my port of libc's C memrchr implementation (bench_rust_memrchr2)

running 4 tests
test bench_iter ... bench: 20,100 ns/iter (+/- 1,046)
test bench_libc_memrchr ... bench: 32 ns/iter (+/- 2)
test bench_rust_memrchr1 ... bench: 3,895 ns/iter (+/- 200)
test bench_rust_memrchr2 ... bench: 5,832 ns/iter (+/- 404)

Libc's optimized version with vectorization beats all other implementations by a very wide margin, probably because the Rust code is not vectorized. I am currently looking into building a Rust version that uses the simd crate to build a vectorized version.

fhahn · 2015-12-02T11:43:26Z

The code of the benchmark can be found here: https://gist.github.com/fhahn/7a7b46e75bc17f1b39a4

huonw · 2015-12-02T12:13:49Z

Hm, a 100+x increase in speed seems very surprising, even for vectorisation; there may be something else going on that's making the benchmark non-representative. Especially since 32 ns/iter (22 on my machine) implies a data transfer rate of 940 GB/s (1300 on my machine)... it's not entirely out of the realms of possibility, since all the data will be in L1 cache, but it does seem rather fast.

huonw · 2015-12-02T12:18:08Z

Ah, looks like this loop:

            for i in 0..n {
                match get_memrchr(s.as_bytes()) {
                    Some(x) => ret += x,
                    None => ()
                };
                ret += i;
            }

is being optimised to something equivalent to the following (which also allows the loop to be folded to be O(1)):

            let result = get_memrchr(s.as_bytes());
            for i in 0..n {
                match result {
                    Some(x) => ret += x,
                    None => ()
                };
                ret += i;
            }

Turning the s.as_bytes() call into test::black_box(&s).as_bytes() (and all the other s.as_bytes() calls too, just in case) gives the following result on my computer:

test bench_iter          ... bench:      13,253 ns/iter (+/- 504)
test bench_libc_memrchr  ... bench:         690 ns/iter (+/- 30)
test bench_rust_memrchr1 ... bench:       3,000 ns/iter (+/- 58)
test bench_rust_memrchr2 ... bench:       3,577 ns/iter (+/- 68)

which is along the lines of what I would expect.

fhahn · 2015-12-02T12:24:12Z

Thanks @huonw for discovering that. I was also surprised by the result.

Hm as_bytes() is used in all benchmarks to pass the buffer, strange that rustc only folded the loop for the libc version.

fhahn · 2015-12-02T12:25:30Z

@alexcrichton do you think it makes sense to change write to use memchr, which should be available on all platforms? Or should we go with shipping our own memrchr implementations for platforms that do not have it?

bluss · 2015-12-02T15:01:38Z

@BurntSushi's Rust imeplementation of memrchr (bench_rust_memrchr1)

The crate is his, but I contributed this part.

I think we should use memrchr to still write & flush as much as possible. I should have written memrchr in the original issue if I had read the code with more attention (it uses rposition).

fhahn · 2015-12-02T16:27:47Z

@bluss Shouldn't it be possible to flush as often as before even when using memchr instead of memrchr? It should flush after each \n, right? But I have to admit I am not sure how LineWriter::write does that. So far it seems to write the buffer up to the last \n, then flushes and prints the rest. I am probably missing something, but shouldn't it call LineWriter::write again for the first part of the buffer, because that could still contain \ns?

bluss · 2015-12-02T16:36:39Z

It looks like the previous behavior is to find the last \n, write to the inner Write until that point, and flush, so it only flushes once per LineWriter::write call.

It's true that for binary data that contains no newlines at all, it doesn't matter which direction we search this, but for 64KB of text with a newline near the end, it does. Both for performance & for flushing after the last newline.

alexcrichton · 2015-12-02T16:45:57Z

Yeah we'll want to maintain the same semantics to ensure that stdout of programs is properly buffered, otherwise if you wrote 2 lines to stdout at once you may only see one and be waiting for the second until later on when the thread dies and fully flushes stdout.

I'd certainly believe that an SSE and/or vectorized implementation of memrchr is faster than a hand-written one, but we do also have simd in rust (unstable), so it should at least in theory be possible!

bluss · 2015-12-02T16:52:10Z

It sounds like we should go ahead with a memrchr implemented in rust. As said, if this is a safe function, then the change to LineWriter::write will be simple & equivalent.

huonw · 2015-12-02T21:45:51Z

Hm as_bytes() is used in all benchmarks to pass the buffer, strange that rustc only folded the loop for the libc version.

LLVM has rules that tell it that memrchr is pure (i.e. calling it once is the same as calling it multiple times), but it is apparently not inferring this for the Rust versions.

fhahn · 2015-12-11T23:43:47Z

@huonw Thanks for looking into that!

@alexcrichton @bluss so going forward I should add safe memchr and memrchr functions to core, which reuse C implementation if possible, similar to what @BurntSushi's rust-memchr does at the moment. This would mean that the crate would become obsolete. It seems quite popular (163,233 downloads on crates.io right now) and I want to make sure not to step on anybodys toes.

Regarding a SIMD implementation in Rust, the simd crate shipped with rustc is deprecated and the external crate should be used. So a hand written SIMD version could not make it into core, right?

bluss · 2015-12-11T23:51:28Z

I don't think we can put the libc fallback in libcore, since we don't want to depend on it. I want / need the pure rust mem(r)chr in libcore, but maybe that can be a separate PR. Don't worry about obsoleting rust-memchr if that should happen; it's also far into the future (needs to be stabilized in libstd before that, and it's not even been proposed here).

Since @alexcrichton prefers this only as a private implementation detail, we can add it to just libstd for now? I just wanted to raise the flag that it will be needed in libcore as soon as we get around to it. Ascii byte searches and single byte substring searches could be massively improved.

alexcrichton · 2015-12-12T16:58:39Z

I forget how much byte searching we do in libcore but if we do it a lot in both std and core then it's fine to have as a public-but-unstable detail for now in libcore, but @bluss is right that libcore should not depend on libc (although libstd is allowed to do so)

fhahn · 2015-12-14T23:48:21Z

I've created a new PR request, which adds memchr and memrchr based on rust-memchr

@BurntSushi

This PR adds `memchr`and `memrchr` based on @BurntSushi 's rust-memchr crate to libstd (as discussed in #30151). I've update some places in libstd to use memchr/memrchr, but I am not sure if there are other places where it could be used as well. ref #30076

Use libc::memchr instead of iter().rposition(), closes rust-lang#30076

86e7998

rust-highfive assigned alexcrichton Dec 1, 2015

bluss reviewed Dec 1, 2015
View reviewed changes

fhahn mentioned this pull request Dec 14, 2015

Memchr in std #30381

Merged

fhahn closed this Dec 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use libc::memchr instead of iter().rposition(), closes #30076 #30151

Use libc::memchr instead of iter().rposition(), closes #30076 #30151

fhahn commented Dec 1, 2015

rust-highfive commented Dec 1, 2015

bluss Dec 1, 2015

bluss commented Dec 1, 2015

bluss commented Dec 1, 2015

alexcrichton commented Dec 1, 2015

bluss commented Dec 2, 2015

bluss commented Dec 2, 2015

fhahn commented Dec 2, 2015

fhahn commented Dec 2, 2015

huonw commented Dec 2, 2015

huonw commented Dec 2, 2015

fhahn commented Dec 2, 2015

fhahn commented Dec 2, 2015

bluss commented Dec 2, 2015

fhahn commented Dec 2, 2015

bluss commented Dec 2, 2015

alexcrichton commented Dec 2, 2015

bluss commented Dec 2, 2015

huonw commented Dec 2, 2015

fhahn commented Dec 11, 2015

bluss commented Dec 11, 2015

alexcrichton commented Dec 12, 2015

fhahn commented Dec 14, 2015

Use libc::memchr instead of iter().rposition(), closes #30076 #30151

Use libc::memchr instead of iter().rposition(), closes #30076 #30151

Conversation

fhahn commented Dec 1, 2015

rust-highfive commented Dec 1, 2015

bluss Dec 1, 2015

Choose a reason for hiding this comment

bluss commented Dec 1, 2015

bluss commented Dec 1, 2015

alexcrichton commented Dec 1, 2015

bluss commented Dec 2, 2015

bluss commented Dec 2, 2015

fhahn commented Dec 2, 2015

fhahn commented Dec 2, 2015

huonw commented Dec 2, 2015

huonw commented Dec 2, 2015

fhahn commented Dec 2, 2015

fhahn commented Dec 2, 2015

bluss commented Dec 2, 2015

fhahn commented Dec 2, 2015

bluss commented Dec 2, 2015

alexcrichton commented Dec 2, 2015

bluss commented Dec 2, 2015

huonw commented Dec 2, 2015

fhahn commented Dec 11, 2015

bluss commented Dec 11, 2015

alexcrichton commented Dec 12, 2015

fhahn commented Dec 14, 2015