Skip to content
DigitalInBlue edited this page Jan 31, 2013 · 1 revision

The goal, generally, of writing benchmarks is to measure the performance of a piece of code. Benchmarks are useful for comparing multiple solutions to the same problem to select the most appropriate one. Other times, benchmarks can highlight the performance impact of design or algorithm changes and quantify them in a meaningful way.

By measuring code performance, you eliminate errors in your assumptions about what the “right” solution is for performance. Only through measurement can you confirm that using a lookup table, for example, is faster than computing a value. Such lore (which is often repeated) can lead to bad design decisions and, ultimately, slower code.

The goal in writing good benchmarking code is to eliminate all of the noise and overhead, and measure just the code under test. Sources of noise in the measurements include clock resolution noise, operating system background operations, test setup/teardown, framework overhead, and other unrelated system activity.

At a theoretical level we want to measure “t”, the time to execute the code under test. In reality, we measure “t” plus all of this measurement noise.

These extraneous contributors to our measurement of “t” fluctuate over time. Therefore, we want to try to isolate “t’. The way this is accomplished is by making many measurements, but only keeping the smallest total. The smallest total is necessarily the one with the smallest noise contribution and closest to the actual time “t”.

Once this measurement is obtained, it has little meaning in isolation. It is important to create a baseline test by which to compare. A baseline should generally be a “classic” or “pure” solution to the problem on which you are measuring a solution. Once you have a baseline, you have a meaningful time to compare your algorithm against. Simply saying that your fancy sorting algorithm (fSort) sorted a million elements in 10 milliseconds is not sufficient by itself. However, compare that to a classic sorting algorithm baseline such as quick sort (qSort) and then you can say that fSort is 50% faster than qSort on a million elements. That is a meaningful and powerful measurement.

Clone this wiki locally