-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve quantile in corner cases of collection eltype #30938
Conversation
@nalimilan: four your reference, this causes |
stdlib/Statistics/src/Statistics.jl
Outdated
quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted) | ||
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) | ||
T = Core.Compiler.typesubtract(eltype(v), Missing) | ||
S = T <: AbstractFloat ? T : promote_type(T, Float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to use the same approach as method below which is used when p
is a tuple (which calls quantilesort!
and then _quantile
)? promote_type(T, Float64)
is problematic here, as we claim to support types other than real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed.
stdlib/Statistics/src/Statistics.jl
Outdated
@@ -821,8 +821,13 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray; | |||
return q | |||
end | |||
|
|||
quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) = | |||
quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted) | |||
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually can't we just use this method for tuple p
? map
should do the right thing in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I will merge them.
CI error is unrelated |
@test quantile([1, 2, 3, 4], ()) == () | ||
@test isempty(quantile([1, 2, 3, 4], Float64[])) | ||
@test quantile([1, 2, 3, 4], Float64[]) isa Vector{Float64} | ||
@test quantile([1, 2, 3, 4], []) isa Vector{Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth testing with p::Vector{Any}
when it is non-empty, since we allow for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have added two more tests. What is relevant here is that using map
we have some type instability (this problem was present with tuple version already) - I expose this instability in tests.
Here is the problem (I show it on tuple which works the same before and after this PR):
julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
(0.1, 0.2, 0.4, 0.9))
(2.0, 3.0, 5.0, 11.0)
julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
(0.1, 0.2, 0.4, 0.9))
(2, 3, 5, 11)
julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
(0.1, 0.2, 0.4, 0.99))
(2.0, 3.0, 5.0, 16.400000000000002)
julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
(0.1, 0.2, 0.4, 0.99))
(2, 3, 5, 16.400000000000002)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's annoying. Maybe change:
T = promote_type(eltype(v), typeof(v[1]*h))
to
T = promote_type(typeof(v[1]), typeof(v[1]*h))
Or maybe we can just do T = typeof(v[1]*h)
? What's the point of using promotion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both proposals are not ideal as v[1]
may have a type unrelated to the type we actually need, e.g. for Vector{Real}
with heterogeneous inputs. I have proposed something else that I think better captures what we want - it always combines the types of elements required and the type of quantile required.
Note that in corner cases of 0
or 1
as quantile we still get an integer:
julia> quantile([1,2,3], 1)
3
julia> quantile([1,2,3], 0)
1
but I think it is OK to leave it as is (we could enforce these cases to produce a float result, but I am not sure if we should, as in general we do not guarantee that quantile
returns an AbstractFloat
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Just to be sure: when you say we get an integer for 0
and 1
, that's not the case with 0.0
and 1.0
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Simply only if you pass an integer as a quantile (which is possible if the quantile is equal to 0
or 1
). The key line is f0 = (lv - 1)*p
with evaluates to an integer if p
is an integer, and this propagates (as we only have additions and multiplications later).
I made sure that what we have now is type stable for normal cases of containers with concrete types. However, still if we have an e.g.
|
That's the |
Agreed - but given you made #30485, is it possible to also cover |
Probably, but first the approach used by #30485 has to be approved... |
The CI errors seem unrelated. |
Merge? |
bump |
My review is requested but I'm not really sure what I'm reviewing for. @nalimilan, what were you looking for input on? |
I just want you to merge if you think it's OK, so that I'm not the only one to blame if I missed something. ;-) |
This PR fixes an inconsistency in
quantile
which worked or failed depending whetherp
was a tuple or not. For example:A practical common case is when
itr
haseltype
that allowsMissing
but it actually the collection does not contain any missing values (CSV.jl by default reads the data in this way). Now calculating quantiles on such collections fails:while passing a tuple as
p
works:As
quantile
does not allow missing values, we try to stripMissing
from the returneltype
if possible (this will work unless theeltype
isAny
, in which case it will stayAny
).