It used to be that benchmarking required a significant amount of time and numerous iterations to arrive at meaningful results, which was particularly arduous when trying to detect subtle changes, such as those within the range of a few percentage points.
Introducing Tango.rs, a novel benchmarking framework that employs paired benchmarking to assess code performance. This approach capitalizes on the fact that it's far more efficient to measure the performance difference between two simultaneously executing functions compared to two functions executed consecutively.
Features:
- very high sensitivity to changes which allows to converge on results quicker than traditional (pointwise) approach. Often the fraction of a second is enough;
- ability to compare different versions of the same code from different VCS commits (A/B-benchmarking);
- async support using tokio.rs;
- macOS, Linux and Windows support;
Compared to traditional pointwise benchmarking, paired benchmarking is significantly more sensitive to changes. This heightened sensitivity enables the early detection of statistically significant performance variations.
Tango is designed to have the capability to detect a 1% change in performance within just 1 second in at least 9 out of 10 test runs.
- Rust and Cargo toolchain installed (Rust stable is supported on Linux/macOS, nightly is required for Windows)
- (Optional)
cargo-export
installed
-
Add cargo dependency and create new benchmark:
[dev-dependencies] tango-bench = "0.6" [[bench]] name = "factorial" harness = false
-
allows rustc to export symbols for dynamic linking from benchmarks
-
(Linux/macOS) Add build script (
build.rs
) with following contentfn main() { println!("cargo:rustc-link-arg-benches=-rdynamic"); println!("cargo:rerun-if-changed=build.rs"); }
-
(Windows, nightly required) Add following code to cargo config (
.cargo/config
)[build] rustflags = ["-Zexport-executable-symbols"]
-
-
Add
benches/factorial.rs
with the following content:use std::hint::black_box; use tango_bench::{benchmark_fn, tango_benchmarks, tango_main, IntoBenchmarks}; pub fn factorial(mut n: usize) -> usize { let mut result = 1usize; while n > 0 { result = result.wrapping_mul(black_box(n)); n -= 1; } result } fn factorial_benchmarks() -> impl IntoBenchmarks { [ benchmark_fn("factorial", |b| b.iter(|| factorial(500))), ] } tango_benchmarks!(factorial_benchmarks()); tango_main!();
-
Build and export benchmark to
target/benchmarks
directory:$ cargo export target/benchmarks -- bench --bench=factorial
-
Now lets try to modify
factorial.rs
and make factorial faster :)fn factorial_benchmarks() -> impl IntoBenchmarks { [ benchmark_fn("factorial", |b| b.iter(|| factorial(495))), ] }
-
Now we can compare new version with already built one:
$ cargo bench -q --bench=factorial -- compare target/benchmarks/factorial factorial [ 375.5 ns ... 369.0 ns ] -1.58%*
The result shows that indeed there is indeed ~1% difference between factorial(500)
and factorial(495)
.
Additional examples are available in examples
directory.
To use Tango.rs in an asynchronous setup, follow these steps:
-
Add
tokio
andtango-bench
dependencies to yourCargo.toml
:[dev-dependencies] tango-bench = { version = "0.6", features = ["async-tokio"] } [[bench]] name = "async_factorial" harness = false
-
Create
benches/async_factorial.rs
with the following content:use std::hint::black_box; use tango_bench::{ async_benchmark_fn, asynchronous::tokio::TokioRuntime, tango_benchmarks, tango_main, IntoBenchmarks, }; pub async fn factorial(mut n: usize) -> usize { let mut result = 1usize; while n > 0 { result = result.wrapping_mul(black_box(n)); n -= 1; } result } fn benchmarks() -> impl IntoBenchmarks { [async_benchmark_fn("async_factorial", TokioRuntime, |b| { b.iter(|| async { factorial(500).await }) })] } tango_benchmarks!(benchmarks()); tango_main!();
-
Build and use benchmarks as you do in synchronous case
$ cargo bench -q --bench=async_factorial -- compare
There are several arguments you can pass to the compare
command to change it behavior
-t
,--time
– how long to run each benchmark (in seconds)-s
,--samples
– how much samples to gather from each benchmark-f
– filter benchmarks by name. Glob patterns are supported (eg.*/bench_name/{2,4,8}/**
)-d [path]
– dump CSV with raw samples in a given directory--gnuplot
– generate plot for each benchmark (requires gnuplot to be installed)-o
,--filter-outliers
– additionally filter outliers-p
,--parallel
- run base/candidate functions in 2 different threads instead of interleaving in a single thread--fail-threshold
– do fail if new version is slower than baseline on a given percentage--fail-fast
- do fail after first benchmark exceeding fail threshold, not after the whole suite
The project is in its early stages so any help will be appreciated. Here are some ideas you might find interesting
- find a way to provide a more user friendly API for registering functions in the system
- if you're a library author, trying out tango and providing feedback will be very useful