Counting bytes really fast
This uses the "hyperscreamingcount" algorithm by Joshua Landau to count bytes faster than anything else. The newlinebench repository has further benchmarks for old versions of this repository.
To use bytecount in your crate, if you have cargo-edit, just type
cargo add bytecount
in a terminal with the crate root as the current path. Otherwise you can manually edit your
Cargo.toml
to add bytecount = 0.6.7
to your [dependencies]
section.
In your crate root (lib.rs
or main.rs
, depending on if you are writing a
library or application), add extern crate bytecount;
. Now you can simply use
bytecount::count
as follows:
extern crate bytecount;
fn main() {
let mytext = "some potentially large text, perhaps read from disk?";
let spaces = bytecount::count(mytext.as_bytes(), b' ');
..
}
bytecount supports two features to make use of modern CPU's features to speed up counting considerably. To allow your
users to use them, add the following to your Cargo.toml
:
[features]
runtime-dispatch-simd = ["bytecount/runtime-dispatch-simd"]
generic-simd = ["bytecount/generic-simd"]
The first, runtime-dispatch-simd
, enables detection of SIMD capabilities at runtime, which allows using the SSE2 and
AVX2 codepaths, but cannot be used with no_std
.
Your users can then compile with runtime dispatch using:
cargo build --release --features runtime-dispatch-simd
The second, generic-simd
, uses packed_simd
to provide a fast
architecture-agnostic SIMD codepath, but requires running on nightly.
Your users can compile with this codepath using:
cargo build --release --features generic-simd
Building for a more specific architecture will also improve performance. You can do this with
RUSTFLAGS="-C target-cpu=native" cargo build --release
The scalar algorithm is explained in depth here.
Note: Versions until 0.4.0 worked with Rust as of 1.20.0. Version 0.5.0 until 0.6.0 requires Rust 1.26 or later, and at least 1.27.2 to use SIMD. Versions from 0.6.0 require Rust 1.32.0 or later.
Licensed under either of at your discretion: