Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random: always use SamplerRangeFast for MersenneTwister #27560

Merged
merged 1 commit into from
Jun 25, 2018

Conversation

rfourquet
Copy link
Member

This is a "breaking" change concerning the numbers generated in a call like rand(1:3, 10). It may be too late for 0.7, but on the other hand the resistance to change may be too high in a later release for such non-essential efficiency improvements.

To generate a random value in 1:3, there are 2 distinct steps:

  1. create a Sampler object, which involves one-time computations
  2. use this sampler to generate a bunch of numbers

As the number of generated random values increases, the cost of step 1) becomes negligible (amortization).

We currently have two Sampler types with different compromises on the costs of both steps:

  • SamplerRangeInt (SRI) which is more costly at 1), but is more efficient at using as few entropy bits as possible, in step 2)
  • SamplerRangeFast (SRF) which is cheap at 1), but wastes more entropy bits (depending on the length of the range)

By default, an RNG will use SRI. MersenneTwister uses:

  • SRF for scalar calls, like rand(1:3): MersenneTwister is fast enough at generating entropy that wasting some bits is preferable in this case
  • SRI for array calls, like rand(1:3, 10): this was the the original method, and was not updated when SRF was introduced, as the status-quo was/is faster in some cases.

I propose now to use SRF in all cases for MersenneTwister, for more uniformity (e.g. srand(0); [rand(1:10), rand(1:10)] will give the same result as srand(0); rand(1:10, 2)), and for efficiency, as "most (e.g. 90%) of the time" this will give improved speed.

For a given length of array, the speed of the SRI method doesn't vary much with the length L of the range, unlike with SRF:

  • if L<=2^n with L close to 2^n, SRF can be between 2 and 3 times as fast as SRI
  • if L = k + 2^n with k>0 "small", SRF is slower than SRI by a small margin, e.g. 10%. As k grows, SRF gets faster, and becomes again faster than SRI e.g. when k ≈ (2^n)/10 (which means that SRF is slightly slower than SRI for 10 percent of input ranges of length between 2^n+1 and 2^(n+1)).

I lack time now to do advanced performance analysis and graphics, but here is a representative benchmark session (assume the range is $-escaped):

julia> a = zeros(Int, 1000)

# master: SamplerRangeInt

julia> @btime rand!($a, 1:2^30)
  15.184 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+1)
  15.165 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+2^26)
  15.176 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+2^27)
  15.166 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+2^29)
  15.687 μs (0 allocations: 0 bytes)

# PR: SamplerRangeFast

julia> @btime rand!($a, 1:2^30)
  6.117 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+1)
  16.535 μs (0 allocations: 0 bytes)

julia> @btime rand!($a, 1:2^30+2^26) # still slower than SRI
  15.699 μs (0 allocations: 0 bytes) 

julia> @btime rand!($a, 1:2^30+2^27)
  14.334 μs (0 allocations: 0 bytes) # faster than SRI again

julia> @btime rand!($a, 1:2^30+2^29)
  9.599 μs (0 allocations: 0 bytes) # significantly faster than SRI

@rfourquet rfourquet added performance Must go faster randomness Random number generation and the Random stdlib labels Jun 13, 2018
@rfourquet
Copy link
Member Author

cc. @mschauer

@StefanKarpinski
Copy link
Sponsor Member

We can break some things between 0.7.0-alpha and 0.7.0-beta so this would be ok.

@mschauer
Copy link
Contributor

I am still fascinated that a single division is so expensive compared to our MersenneTwister entropy generation. Anyway I think this is a sane change.

@rfourquet
Copy link
Member Author

Thanks for your answers! So I will mark this for triage.

I am still fascinated that a single division is so expensive compared to our MersenneTwister entropy generation

Me too, indeed! SIMD is certainly crucial for that...

If someone likes to test the performance on her machine, it should be rather straigthforward, with this eval rather than compiling this branch:

using Random
for T in Base.BitInteger_types
    @eval Random Sampler(rng::MersenneTwister, r::UnitRange{$T}, ::Val{Inf}) = SamplerRangeFast(r)
end

@rfourquet rfourquet added the triage This should be discussed on a triage call label Jun 14, 2018
@StefanKarpinski
Copy link
Sponsor Member

@rfourquet: triage agrees that you're likely the only person in a position to really make a call on this. It's fine to make a breaking change now if you deem it worthwhile. Merge if you see fit.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Jun 21, 2018
@rfourquet
Copy link
Member Author

Thanks for having discussed this.

I did few more benchmark to try to get an idea what would be the expected speed-up for a range of "random length" (not so trivial to measure that in a meaningful way; I did what I could). Benchmarking only the step 2) mentioned in the OP (which is equivalent to benchmarking rand! on asymptotically large arrays, which favors SamplerRangeInt), I observe speed-ups typically between 40% and 60% (and a fair majority over 50%), so it seems safe to assume that it will be an improvement on most machines. The main drawback I see is that the time it takes is less predictable (it depends a lot on the length of the range).

If no objection comes, I will therefore merge this week-end.

@mschauer
Copy link
Contributor

The improvement even on asymptotically large arrays settles it.

@rfourquet
Copy link
Member Author

The Travis failure is irrelated to this change and happens in other PRs too (concerns FileWatching).

@rfourquet rfourquet merged commit 8eba7c4 into master Jun 25, 2018
@rfourquet rfourquet deleted the rf/rand/MT-rangeint-fast branch June 25, 2018 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants