Canon uniform benches #1286

dhardy · 2023-02-16T17:49:14Z

Benchmark / development branch for Canon's method of uniform sampling. This PR is for reference purposes only and will not be merged.

Related: #1196

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

This beats other uniform_int_i128 results

Rationale: with multiple RNGs we need to reduce the number of tests; random is the most useful for single samples.

CPU capped to 2GHz; some fluctuation still observed

Previously, nrmr was potentially incorrect (depending on size). thresh32_or_uty is the same as the old z.

…nbiased Running only these benches, results are vaguely similar to before (10% differences aren't uncommon).

Many results for 'sample' methods are similar but several show >10% deviation. Concerning.

Finally with these changes deviations are usually under 1%. But not always: I still see a couple of large (10%+%) deviations (not present on a re-run).

Thanks to increased sample size these offer much more detailed plots.

This is the same as the previous bench except (a) it doesn't pin CPU frequency and (b) it runs 'cargo bench' first in an attempt to remove unusually slow results still sometimes observed.

I.e. just use 64-bit sampling for 32-bit output. This is significantly faster with 64-bit RNGs and similar with 32-bit RNGs.

These include too many files for GitHub's web interface!

dhardy · 2023-02-16T18:05:16Z

Result run 5

(This is the combined plot only since full output page is large enough to crash Firefox.)

Note: "sample" means Canon with 32-bit sampling in i8, i16, 64-bit sampling on i32, i64, and 128-bit sampling on i128. Canon32 means 32-bit sampling. Canon32-2 is Canon32 with an extra round of bias reduction (max three samples). Lemire uses max(size, u32) sampling while Lemire64 uses max(size, u64) sampling.

Further note: all i8, i16 results are so fast it's probably not worth differentiating (much).

Also: with everything I tried, still sometimes a benchmark would run ~15% slower than normal. There may be the odd result that's slower than it should be.

single

i8: reject Biased64 due to poor perf with 32-bit RNGs. sample is a bit ahead, sample-unbiased a bit behind, but not much difference.

i16: similar to i8, but all Canon variants significantly worse with Pcg32, making ONeill look better. Weird.

i32: just pick sample or sample-unbiased (latter wins with Pcg32; weird).

i64: sample is best biased algorithm. Best unbiased is sample-unbiased or ONeill (similar profile aside from different bumps).

i128: Canon-red is best biased algorithm; best unbiased is either sample-unbiased (shortest tail) or canon-red-un (slimmer profile). ONeill is not competitive.

Use sample (Canon's method). Maybe add sample-unbiased as an option.

distribution

(I.e. sampling from the same range repeatedly, ignoring set-up costs.)

distr-random-i8
distr-random-i16
distr-random-i32
distr-random-i32-2 (re-run adding Lemire64)
distr-random-i64
distr-random-i128

i8: reject Biased64; pick anything else.

i16: sample, but sample-unbiased also has notable poor perf.

i32: Canon (sample) is best. Lemire64 is also good.

i64: Lemire is the best on average, but only a little ahead of sample (Canon).

i128: Lemire is fastest, Canon-red next best.

Result: Canon (sample) is generally the fastest method (Canon-red wins for i128, but Canon is still good). Lemire is the fastest unbiased method.

Conclusion

Canon's method (sample) appears to be the fastest option. If we need unbiased sampling, sample-unbiased is pretty good for single-sampling, but Lemire's method is better for distribution sampling.

Further note: all methods are marked #[inline]. Removing this, benchmarks are up to 60% slower (most commonly 20-30% slower). This doesn't quite answer the question of whether #[inline] should be used, but provides some justification for keeping it.

dhardy added 30 commits February 28, 2022 10:21

Apply rustfmt to rand::distributions::uniform module

1ea5f36

Move UniformChar, UniformDuration to new submodule

cea392a

Move UniformFloat to a new submodule

03ddfa8

Move UniformInt to a new submodule

9604f2a

Move UniformInt SIMD implementations to new module

f080e47

UniformInt(single): add ONeill, Canon, Canon-Lemire and Bitmask methods

a87f4e9

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

Bench UniformInt(single) methods (new Criterion benchmark)

5cecc4c

UniformInt(sample): add Lemire, Canon, Canon-Lemire, Bitmask methods

1fa5b7d

UniformInt(sample): add sample_canon_64 (15% faster for i128)

ea9949b

UniformInt(single): add Canon 64 method

a99e9b0

This beats other uniform_int_i128 results

Simplify Lemire's method

80643ec

Bench: add documentation and TODO notes; adjust names

2c36d18

Revise impls: Canon method

2137863

Bench: add random-range variant for single samples

ba8db32

Add unbiased Canon method

11374ee

Bench: remove all but random variant for single-samples

37d16c5

Rationale: with multiple RNGs we need to reduce the number of tests; random is the most useful for single samples.

Bench: restructure

639d0b2

Bench: add multiple RNGs

91d680e

Replace canon_64 with canon_reduced

91e2804

Uniform int impls: rename macro params to $u32_or_uty, $u64_or_uty

9910822

Add Canon32

fde772a

Add results of uniform-int benches

ac994e3

Bench: add distr_random

662adb7

Add results for distr_random benchmarks

1c14f6b

Bench: revise single-random benchmark

6bcf5ef

Add revised results for single-random

9f32ab7

Bench: remove low- and high-reject benchmarks

1dea2c6

Add written report (partial conclusion based on results)

6ba5538

Fix link

dc7016f

Add uniform benchmark results for Intel 1145G7

eadb85d

CPU capped to 2GHz; some fluctuation still observed

dhardy added 26 commits February 6, 2023 14:05

Rename sample_canon_reduced -> sample_biased_64 for sizes i8-i32

b3f652f

Remove canon_u32_2 for >= 32-bits

76934d7

Fix: replace nrmr, z with thresh32_or_uty and 64 variant

0ce5f08

Previously, nrmr was potentially incorrect (depending on size). thresh32_or_uty is the same as the old z.

Remove biased Canon-Lemire method

48e7af3

Update uniform benches (intention and doc)

d9fd6c0

Add Canon32-Unbiased method

55c18fa

Add canon_reduced_unbiased

f71a011

Add results of last run

6d39052

Add note

61915ba

benches/uniform.rs: rename selected algorithms to sample and sample-u…

9f4ca8a

…nbiased Running only these benches, results are vaguely similar to before (10% differences aren't uncommon).

Update criterion to 0.4.0

ae48e8c

Many results for 'sample' methods are similar but several show >10% deviation. Concerning.

benches/uniform.rs: adjust for more reliable results

f5ac943

Finally with these changes deviations are usually under 1%. But not always: I still see a couple of large (10%+%) deviations (not present on a re-run).

Add plots from latest runs

147add9

Thanks to increased sample size these offer much more detailed plots.

Rename files in results-uniform-int-5800X-4

7e3ecc8

Add fifth run

4d0e264

This is the same as the previous bench except (a) it doesn't pin CPU frequency and (b) it runs 'cargo bench' first in an attempt to remove unusually slow results still sometimes observed.

Add Lemire64 (Lemire, 64-bit sampling for 32-bit output)

9232f85

Replace Lemire with Lemire64

858e9f8

I.e. just use 64-bit sampling for 32-bit output. This is significantly faster with 64-bit RNGs and similar with 32-bit RNGs.

Remove ONeill, Biased64 samplers

1030c3b

Remove old samplers

9d24bad

Merge Canon and Canon32 variants

f1486ec

Remove canon-reduced, canon32-2 variants

3c29fb7

benches/uniform.rs: remove $set

6d3cb0e

Remove sample_canon_unbiased: lemire's method is faster

0ad00d8

Add unbiased feature flag; unify sampling methods by name

bcfa27b

Simplify benches/uniform.rs

0217d53

Remove large, older results from repo

9830956

These include too many files for GitHub's web interface!

dhardy mentioned this pull request Feb 17, 2023

Uniform sampling: use Canon's method #1287

Merged

dhardy closed this Mar 24, 2023

newpavlov deleted the canon-uniform-benches branch May 22, 2024 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canon uniform benches #1286

Canon uniform benches #1286

dhardy commented Feb 16, 2023

dhardy commented Feb 16, 2023 •

edited

Loading

Canon uniform benches #1286

Canon uniform benches #1286

Conversation

dhardy commented Feb 16, 2023

dhardy commented Feb 16, 2023 • edited Loading

Result run 5

single

distribution

Conclusion

dhardy commented Feb 16, 2023 •

edited

Loading