Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve quantile in corner cases of collection eltype #30938

Merged
merged 10 commits into from
Apr 1, 2019

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Feb 1, 2019

This PR fixes an inconsistency in quantile which worked or failed depending whether p was a tuple or not. For example:

julia> using Statistics

julia> quantile(Any[1,2,3,4], (0.5, 0.7))
(2.5, 3.0999999999999996)

julia> quantile(Any[1,2,3,4], [0.5, 0.7])
ERROR: MethodError: no method matching AbstractFloat(::Type{Any})

A practical common case is when itr has eltype that allows Missing but it actually the collection does not contain any missing values (CSV.jl by default reads the data in this way). Now calculating quantiles on such collections fails:

julia> quantile(Union{Float64, Missing}[1,2,3,4], 0.1:0.1:0.9)
ERROR: MethodError: no method matching AbstractFloat(::Type{Union{Missing, Float64}})

while passing a tuple as p works:

julia> quantile(Union{Float64, Missing}[1,2,3,4], Tuple(0.1:0.1:0.9))
(1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.0999999999999996, 3.4000000000000004, 3.7)

As quantile does not allow missing values, we try to strip Missing from the return eltype if possible (this will work unless the eltype is Any, in which case it will stay Any).

@bkamins
Copy link
Member Author

bkamins commented Feb 5, 2019

@nalimilan: four your reference, this causes cut in CategoricalArrays.jl fail when what you want to apply cat on allows Missing in eltype.

quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted)
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false)
T = Core.Compiler.typesubtract(eltype(v), Missing)
S = T <: AbstractFloat ? T : promote_type(T, Float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to use the same approach as method below which is used when p is a tuple (which calls quantilesort! and then _quantile)? promote_type(T, Float64) is problematic here, as we claim to support types other than real.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed.

@@ -821,8 +821,13 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
return q
end

quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) =
quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted)
function quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can't we just use this method for tuple p? map should do the right thing in both cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I will merge them.

@bkamins
Copy link
Member Author

bkamins commented Feb 6, 2019

CI error is unrelated

@test quantile([1, 2, 3, 4], ()) == ()
@test isempty(quantile([1, 2, 3, 4], Float64[]))
@test quantile([1, 2, 3, 4], Float64[]) isa Vector{Float64}
@test quantile([1, 2, 3, 4], []) isa Vector{Any}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth testing with p::Vector{Any} when it is non-empty, since we allow for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added two more tests. What is relevant here is that using map we have some type instability (this problem was present with tuple version already) - I expose this instability in tests.

Here is the problem (I show it on tuple which works the same before and after this PR):

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2.0, 3.0, 5.0, 11.0)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2, 3, 5, 11)

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2.0, 3.0, 5.0, 16.400000000000002)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2, 3, 5, 16.400000000000002)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's annoying. Maybe change:

T = promote_type(eltype(v), typeof(v[1]*h))

to

T = promote_type(typeof(v[1]), typeof(v[1]*h))

Or maybe we can just do T = typeof(v[1]*h)? What's the point of using promotion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both proposals are not ideal as v[1] may have a type unrelated to the type we actually need, e.g. for Vector{Real} with heterogeneous inputs. I have proposed something else that I think better captures what we want - it always combines the types of elements required and the type of quantile required.

Note that in corner cases of 0 or 1 as quantile we still get an integer:

julia> quantile([1,2,3], 1)
3

julia> quantile([1,2,3], 0)
1

but I think it is OK to leave it as is (we could enforce these cases to produce a float result, but I am not sure if we should, as in general we do not guarantee that quantile returns an AbstractFloat).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Just to be sure: when you say we get an integer for 0 and 1, that's not the case with 0.0 and 1.0, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Simply only if you pass an integer as a quantile (which is possible if the quantile is equal to 0 or 1). The key line is f0 = (lv - 1)*p with evaluates to an integer if p is an integer, and this propagates (as we only have additions and multiplications later).

@bkamins
Copy link
Member Author

bkamins commented Feb 6, 2019

I made sure that what we have now is type stable for normal cases of containers with concrete types. However, still if we have an e.g. Union{Float64, Missing} container like Union{Float64, Missing}[1,2,3] then quantile will not be type stable as map is not type stable then either. Here is a MWE of the root of the problem:

julia> x = Union{Float64, Missing}[1,2,3]
3-element Array{Union{Missing, Float64},1}:
 1.0
 2.0
 3.0

julia> map(identity, x)
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

julia> @code_warntype map(identity, x)
Body::Array
1 ─ %1 = %new(Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}, identity, A)::Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}
│   %2 = invoke Base._collect(_3::Array{Union{Missing, Float64},1}, %1::Base.Generator{Array{Union{Missing, Float64},1},typeof(identity)}, $(QuoteNode(Base.EltypeUnknown()))::Base.EltypeUnknown, $(QuoteNode(Base.HasShape{1}()))::Base.HasShape{1})::Array
└──      return %2

@nalimilan
Copy link
Member

That's the map equivalent of the inference failure that has already been noted for broadcast at #28382.

@bkamins
Copy link
Member Author

bkamins commented Feb 6, 2019

Agreed - but given you made #30485, is it possible to also cover map and comprehensions (as they have the same problem) there?

@nalimilan
Copy link
Member

Probably, but first the approach used by #30485 has to be approved...

@bkamins
Copy link
Member Author

bkamins commented Feb 7, 2019

The CI errors seem unrelated.

@nalimilan
Copy link
Member

Merge?

@bkamins
Copy link
Member Author

bkamins commented Mar 24, 2019

bump

@StefanKarpinski
Copy link
Member

My review is requested but I'm not really sure what I'm reviewing for. @nalimilan, what were you looking for input on?

@nalimilan
Copy link
Member

I just want you to merge if you think it's OK, so that I'm not the only one to blame if I missed something. ;-)

@nalimilan nalimilan merged commit cc516ab into JuliaLang:master Apr 1, 2019
@bkamins bkamins deleted the fix_quantile branch April 1, 2019 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants