Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canon uniform benches #1286

Closed
wants to merge 62 commits into from
Closed

Canon uniform benches #1286

wants to merge 62 commits into from

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Feb 16, 2023

Benchmark / development branch for Canon's method of uniform sampling. This PR is for reference purposes only and will not be merged.

Related: #1196

This beats other uniform_int_i128 results
Rationale: with multiple RNGs we need to reduce the number
of tests; random is the most useful for single samples.
CPU capped to 2GHz; some fluctuation still observed
dhardy added 26 commits February 6, 2023 14:05
Previously, nrmr was potentially incorrect (depending on
size). thresh32_or_uty is the same as the old z.
…nbiased

Running only these benches, results are vaguely similar to
before (10% differences aren't uncommon).
Many results for 'sample' methods are similar but several
show >10% deviation. Concerning.
Finally with these changes deviations are usually under 1%.
But not always: I still see a couple of large (10%+%)
deviations (not present on a re-run).
Thanks to increased sample size these offer much more
detailed plots.
This is the same as the previous bench except (a) it doesn't
pin CPU frequency and (b) it runs 'cargo bench' first in an
attempt to remove unusually slow results still sometimes
observed.
I.e. just use 64-bit sampling for 32-bit output. This is
significantly faster with 64-bit RNGs and similar with 32-bit RNGs.
These include too many files for GitHub's web interface!
@dhardy
Copy link
Member Author

dhardy commented Feb 16, 2023

Result run 5

(This is the combined plot only since full output page is large enough to crash Firefox.)

Note: "sample" means Canon with 32-bit sampling in i8, i16, 64-bit sampling on i32, i64, and 128-bit sampling on i128. Canon32 means 32-bit sampling. Canon32-2 is Canon32 with an extra round of bias reduction (max three samples). Lemire uses max(size, u32) sampling while Lemire64 uses max(size, u64) sampling.

Further note: all i8, i16 results are so fast it's probably not worth differentiating (much).

Also: with everything I tried, still sometimes a benchmark would run ~15% slower than normal. There may be the odd result that's slower than it should be.

single

i8: reject Biased64 due to poor perf with 32-bit RNGs. sample is a bit ahead, sample-unbiased a bit behind, but not much difference.

i16: similar to i8, but all Canon variants significantly worse with Pcg32, making ONeill look better. Weird.

i32: just pick sample or sample-unbiased (latter wins with Pcg32; weird).

i64: sample is best biased algorithm. Best unbiased is sample-unbiased or ONeill (similar profile aside from different bumps).

i128: Canon-red is best biased algorithm; best unbiased is either sample-unbiased (shortest tail) or canon-red-un (slimmer profile). ONeill is not competitive.

Use sample (Canon's method). Maybe add sample-unbiased as an option.

distribution

(I.e. sampling from the same range repeatedly, ignoring set-up costs.)

i8: reject Biased64; pick anything else.

i16: sample, but sample-unbiased also has notable poor perf.

i32: Canon (sample) is best. Lemire64 is also good.

i64: Lemire is the best on average, but only a little ahead of sample (Canon).

i128: Lemire is fastest, Canon-red next best.

Result: Canon (sample) is generally the fastest method (Canon-red wins for i128, but Canon is still good). Lemire is the fastest unbiased method.

Conclusion

Canon's method (sample) appears to be the fastest option. If we need unbiased sampling, sample-unbiased is pretty good for single-sampling, but Lemire's method is better for distribution sampling.

Further note: all methods are marked #[inline]. Removing this, benchmarks are up to 60% slower (most commonly 20-30% slower). This doesn't quite answer the question of whether #[inline] should be used, but provides some justification for keeping it.

@dhardy dhardy closed this Mar 24, 2023
@newpavlov newpavlov deleted the canon-uniform-benches branch May 22, 2024 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant