-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random: introduce Sampler
to formalize hooking into rand machinery
#23964
Conversation
State
to formalize hooking into rand machineryState
to formalize hooking into rand machinery
Maybe "Sampler" instead of "State" would be more descriptive? |
base/random/RNGs.jl
Outdated
@@ -317,65 +325,63 @@ function rand!(r::MersenneTwister, A::Array{Float64}, n::Int=length(A), | |||
A | |||
end | |||
|
|||
rand!(r::MersenneTwister, A::Array{Float64}, st::StateTrivial{<:FloatInterval_64}) = | |||
_rand!(r, A, length(A), st[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the location of the performance regression? you could try noinline
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(here or in _rand!
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking! Actually the performance hit (by about 1x or 2x) doesn't occur here. I will comment where it happens.
Of course the name is up for debate! I'm not familiar with the Distributions package (I don't manage to install it), but would |
base/random/generation.jl
Outdated
end | ||
|
||
rand!(A::AbstractArray, r::AbstractArray) = rand!(GLOBAL_RNG, A, r) | ||
State(rng::AbstractRNG, r::AbstractArray) = StateSimple(r, State(rng, 1:length(r))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mschauer The performace hit happens between here and below in rand(rng::AbstractRNG, st::StateSimple{<:AbstractArray,<:State})
. StateSimple
introduces a kind of indirection compared to how this was implemented before (in a call like rand(rng, [1, 2, 3])
), which currently is not eliminated/optimized-out (but could be in theory, I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it reminded me of https://github.com/JuliaGraphs/LightGraphs.jl/blob/master/src/graphtypes/simplegraphs/simpleedgeiter.jl#L85
where the wrapper SimpleEdgeIter
is completely optimized away, but only if has_edge
is not inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... I don't understand how this works, but will then try to put some @noinline
here or there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo cult programming :-D
Distributions claimed |
But is a sampler in the Distributions terminlogy similar to the |
I think |
A |
I would like to have |
f1296de
to
004b12c
Compare
So, given that there seemed to be interest and that this work is difficult to rebase (I wrote a version 6 months ago and had to basically rewrite it from scratch), I decided to write a band-aid for the perf regression... and discovered that actually it was caused by calling a less efficient algorithm (it's always easier to accuse the compiler ;-) ). Basically the problem is that in different places, we use different algorithms depending on whether we generate one or indefinitely many random values. It turned out that the amount of hack necessary to make these optimizations compatible with the
Of course the name The performance seem now to mostly not show regression! But in some cases, there can be like 10% regression, e.g. for I think another nice consequence of this framework is to help solving the open problem of making it easier for RNG implementors to re-use the implementation for some basic types (e.g defining So please feel free to review, I would like to not wait too long before merging to avoid too many rebases. |
004b12c
to
04c75b1
Compare
04c75b1
to
1a6807a
Compare
I rebased and renamed |
Any particular reason to take |
Absolutely an oversight, thanks! (so there was a simple reason why I was annoyed with the name in my previous comment ;-) ) |
Oh, and I realize that the |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
9dbc720
to
e7afa5f
Compare
Interesting, I can't reproduce locally the last three results from nanosoldier's report (1 improvement and 2 regressions), so re-running this part: The I will merge within one or two days if no objection. |
@nanosoldier |
The syntax of that request is off (one too many @nanosoldier |
Oups sorry if I confused and broke Nanosoldier! |
Nah, no worries. Usually Nanosoldier is better at recovering from things like that, so I'm not sure what the deal is. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
There is something strange, which would make me believe Nanosoldier inverted the results in the last run: for three of the results appearing in the second benchmark, which correspond to the last three results of the first benchmark, the numbers for time ration are inverted. Which would show some benchmarking consistency if indeed Nanosoldier confused master vs PR. @ararslan could this be possible that this happened? Still I would be puzzled, as I can't reproduce that locally. Assuming the first report is more correct, I would then be inclined to merge as is, as the two regressions, for |
Even more interesting (or puzzling): @nalimilan reports exactly the same 3 |
I remember seeing something similar related to the #20621 PR. After that was merged, in another, unrelated PR there were huge differences in the |
State
to formalize hooking into rand machineryState
to formalize hooking into rand machinery
04b519b
to
3b9db7c
Compare
[ci skip] CI already ran.
3b9db7c
to
7497379
Compare
CI was all green except for this one which has a too long log, but that I assume unrelated: https://travis-ci.org/JuliaLang/julia/jobs/309063786. |
State
to formalize hooking into rand machinerySampler
to formalize hooking into rand machinery
And thanks all for feed-back! |
What should I do? Is the fallback not working properly? cc @rfourquet |
The error message is worse than a method error. With a method error I would at least know what function was attempted to get called and what the types were. |
By default, generating random floating point values of type
You seem to assume that this PR intentionally by-passed what would naturally be a default |
Getting personal assistance from a dev is not something we can typically expect people to get. If the whole
This seems like a severe regression in the API for the simple case. Is there any way to have fallbacks so previous definitions keep working? |
Sure; but as I said in the issue you opened, nothing was documented before, and the system was fragile and not designed for extension (though it worked somewhat in some cases).
having to write
I tried to have that of course, but this ends up in mutually recursive function calls which will stack overflow if no definition is provided for the custom type. I.e. (with some simplification, but it's the idea) : |
I was just offering you my assistance in case you must update a package or something to 0.7 before I have a chance to document the new system. I didn't mean this is my solution to the documentation issue :) |
I see, thank you! |
Can |
(I edited your post, the
Not in the current state, as this will also lead to a stackoverflow problem (if |
NOTE: this is WIP as currently this incurs a performance regression, unfortunately, so this may have to sleep here for some time.
This is an attempt at making it easier for user code to define
rand
on custom types. This was also motivated by the tedious code repetition in Base. For each new type for which rand is defined, we had to define all different incantations, in particular array versions.To define
rand
on a new typeT
, it would now be enough to define:State(rng, x::T)
, which creates an object (say of typeStateT<:State
) possibly storing some state which can help speed up generation of multiple values (typically what we do forrand(1:n)
)rand(rng, st::StateT)
.As in most cases, the state is trivial, there are default fall-back methods (
State(rng, x) = StateTrivial(x)
whereStateTrivial
is a predefined type with 1 field which wrapsx
) which handle point 1 automatically, and allow to simply definerand(rng, st::StateTrivial{T})
(point 2).The benefit is also that it exposes to the user a way to generate more efficiently (when the state is not trivial) multiple random values, e.g.
st = State(rng, 1:10); while cond; ... y = rand(rng, st)... end
.rng
has to be an argument toState
as the optimal state could depend on the type ofrng
.Note that a subsequent work could be to combine
rng
and the state into a single object for an easier API:g = Generator(rng, 1:10); ... y = rand(g)
.As said, this incurs a penalty for generation from an array:
rand(rng, [1, 2, 3])
. The reason is that (AFAIU)State(rng, [1, 2, 3])
creates an objectStateSimple([1, 2, 3], some_state)
which holds a reference to the initial array; it doesn't seem to be a problem of heap allocation, but rather that the creation of theStateSimple
object isn't currently optimized out. This question is clearly not in my comfort zone, and I would love if someone can help me solve this performance problem.Note also that in the current state, nanosoldier would report other performance regressions, but which are not real, cf. JuliaCI/BaseBenchmarks.jl#124.