-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: random unleashed #24912
WIP/RFC: random unleashed #24912
Conversation
I updated with the addition of new (continuous) distributions: Also, I'm not very happy with using the name |
I will also add the triage label: it's not currently a breaking change, but it could if we decide to make the deprecations mentioned in the OP. If time is too short to make a decision concerning the API proposed here before 1.0, it can still be discussed to deprecate some functions (or move them into a module out of base), with the idea to provide a new API in 1.x (i.e. the one in this PR or an alternate one, or even the old one if it's found to be the best after all). |
This is just a first shot, I'll have to think about this somewhat more, but I think you could go along the following lines. Make
where
is not too long and less ambiguous than
|
0bd67f0
to
433c879
Compare
Thanks @mschauer ! So you pushed me to implement the last bit (only a handful of LOC) of design I had in mind :) I will describe it here, and then answer to your suggestions. For the spoiler, When introducing the
The problem is the repetition of I wonder if Concerning your points:
It's actually already done. My mistake was to also use the name
Given what I explained above,
It sounds good, but I feel I can only take care of making |
Also, I wonder it we should have |
Other array constructors have gone in the |
I think we can. But for example, with the proposed API here, we have EDIT: BTW, besides |
Heureka! |
Before, a call like `rand(mm, Sampler(mm, 1:10), 3)` generated an `Array{Any,1}`, so a way to get the `eltype` of a Sampler is necessary. Instead of changing Sampler -> Sampler{E}, implementing appropriate eltype methods would have been possible, to keep the helper Sampler subtypes more flexible, but it seemed to be simpler this way.
* Normal & Exponential distributions * Pair * Complex
* implement generation of random dictionaries: rand(Combine(Pair, Int, 1:3), Dict, 10) * implement generation of random sets: rand(1:3, Set, 10) * supersede sprand[n](m, [m], p, rfn) by rand(X, p::AbstractFloat, n, [m]) * supersede randstring(n, chars) by rand(chars, String, n) * supersede bitrand(dims) by rand(BitArray, dims)
df2f4e4
to
9e3bb36
Compare
Small update regarding the triage decision. There are mainly 2 APIs here: the old one and the new one with a
So let's go with the hybrid approach for now, with I finally got to rebase this, with commit re-ordering and as much squashing as I thought made sense! I also added rudimentary documentation and some tests, which is as much as I can do now. If not objections, I will merge tomorrow (unless I need more time to address review comments), with the plan to incrementally improve docs and tests, and features, when I can. |
Indeed, IIRC no one advocated for deprecating that API in the foreseeable future :).
What is the "hybrid" approach? :) Thanks! |
I think it was Stefan's expression? Which would mean keeping the old API (with the few extensions proposed here), and starting playing with the new one, a.k.a. |
To clarify, do you mean the extensions to |
Did I misunderstand the conclusion from triage? The extensions with the container type argument are the main point of this PR. In other words, |
I imagine we all left that conversation somewhat confused and uncertain 😄. My understanding was that we would include in 1.0 all extensions under the |
Probably, and this is not helped by my handicap in english with live converstations!
I hope someone will chime in! But I would be sad to stay with the status quo. We are not sure to find a general alternative design which covers all the possibilities of the proposed
|
I sympathize with |
Just to clarify: my position is not specifically to favor "including the container type" vs the 24595 way, although I'm not yet fully convinced it will work in a satisfying way to replace rand related functions. But I will not work against the transition, and I may even help! My point is to get a unified API for 1.0, instead of the zoo we have now. My main target is
If so, I'm very interested as to why! My second point is, while we are it, let's enable generating I just don't understand the resistance about this change, as I can't see the drawbacks, as it doesn't prevent anything concerning a possible future API based on 24595. Am I missing something? |
Haven't read all of this in detail yet but I'm not a fan of APIs like |
I don't necessarily disagree, but we are not there yet, with I'm afraid I will have to add triage again, as I don't know how to go forward. The possibilities:
|
Triage likes the idea of deprecating I think we should hold off on the other functions for now. For example it's weird to me that |
Right. And I can't help so much with this, as I can't install this package on my computer (BinDeps problems with NixOS).
I understand. Although I guess one gets used to it, like for If you don't mind, I will open a PR to merge the changes of 7f2f88a (make |
two small changes from #24912
This here is somewhat orthogonal to the
so these are not able to describe the randomness produced by a call to say |
Triage feels that this is too half-baked for such a late point in the release. It also feels somewhat incoherent to have a parallel universe of distribution-like things in |
I didn't notice it was still labeled for triage!
I will hopefully make a package out of this PR, to explore a bit more this design space. Feed-back on the shortcomings etc. will be welcome :)
I agree, but I also feel that the |
I think I disagree - not so much with the prioritisation, too much work and too little time left - but with the general notion that this does not belong to stdlib/base. If Julia provides means to sample exponential, Gaussian and uniform random variables in various |
I ported this PR over a new |
Closing here since this is in RandomExtenstions.jl. @rfourquet hope that is ok. |
#23964 decoupled the generation of random values out of "distribution specifiers" (e.g.
1:10
orFloat64
) from the generation of arrays filled with such values. This unlocked the possibility to generate other kinds of containers with the same distribution specifiers. This PR is a proposition for evolving therand
API in this direction, with an implementation as a proof of concept:It's very similar to what exists now:
S
is the "distribution specifier", which controls how scalar values are generatedC...
is the container specifierThe standard container specifier has this form:
C... == ContainerName, specific_parameters...
, which allows the owner of a type to plug into this API without conflict. Base has the priviledge to reserve a couple terser APIs:C... == dims::Dims
orC... == dims::Integer...
for generating ArraysC... == p::AbstractFloat, m, [n]
for generating a sparse array, likesprand(m, n, p)
Now, how to specify
S
when we want to generate aDict
? we need to combine a specification for the keys, with a specification for the values. So we introduce aDistribution
type to handle that, for examplerand(Distribution(Pair, 1:10, Float64))
generates aPair{Int,Float64}
, andrand(Distribution(Pair, 1:100, Float64), Dict, 10)
will generate aDict
of 10 such pairs.What about normal distributions? it's not scalable to replicate those implementations for
randn
(andrandexp
), so we introduceNormal{T} <: Distribution{T}
.For example, we can do
rand(Normal(), Set, 10)
to generate a length-10Set
of values drawn from the normal distribution, orrand(Normal(Complex{Float64}), 0.2, 100)
to generate a sparse vector with values drawn from the "circularly symmetric complex normal distribution".Here is the list of other examples implemented using this API:
rand(Distribution(Complex, 1:10))
to generateComplex{Int}
values where each composant is generated out of1:10
(it's also possible to specify a different specifier for each composant)rand([chars], String, [n])
supersedingrandstring([chars], [n])
rand(BitArray, dims...)
supersedingbitrand(dims...)
I would favor deprecating
bitrand
, unless the new version is too verbose?randstring
(or is itstringrand
andrandbit
? 😛 ), which is not seriously shorter thanrand(String)
sprand
/sprandn
: its API is a bit complicated, in particular with therfn::Function
argument (which is meant to implement a distribution), which I would really like to see go awayrandn
? (I'm not a big user, so I have no opinion on this one)but this is not part (yet?) of this already big PR (which I can break into smaller PRs if requested).
EDIT: my first idea was an API like
rand(rng, (Pair, Float64, 1:10))
instead of the more verboserand(rng, Distribution(Pair, Float64, 1:10))
, but the problem is that(Pair, Float64, 1:10)
has typeTuple{DataType, DataType, UnitRange{Int}}
, which doesn't lead to well typed/efficient code. An alternative would be(Val(Pair), Val(Float64), 1:10)
but this is ugly. So unless the rules for the type of tuple literals change, or a syntax like{Pair, Float64, 1:10}
is introduced to infer tuple types asTuple{Type{Pair}, Type{Float64}, UnitRange{Int}}
, I prefer theDistribution
version.TODO:
rand(Array, dims...)
in addition to the already existingrand(dims...)
, same question for Sparserand!(::SparseVector)
(currently stores as many random values as possible, i.e. as dense as possible; should probably only overwrite already stored values) (possibly do in another PR)Uniform(Int)
,Uniform(1:10)
etc.cc @Sacha0 who had asked me recently to share my thought on this; I didn't follow the recent arrays API overhaul as closely as I would have liked, so I don't know if this proposal is consistent with it.