Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve quantile in corner cases of collection eltype #30938

Merged
merged 10 commits into from
Apr 1, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ Standard library changes

* Fixed `repr` such that it displays `DateTime` as it would be entered in Julia ([#30200]).

#### Statistics

* `quantile` now accepts in all cases collections whose `eltype` is not a subtype of `Number` ([#30938]).

#### Miscellaneous

* Since environment variables on Windows are case-insensitive, `ENV` now converts its keys
Expand Down
17 changes: 8 additions & 9 deletions stdlib/Statistics/src/Statistics.jl
Original file line number Diff line number Diff line change
Expand Up @@ -821,19 +821,18 @@ function quantile!(q::AbstractArray, v::AbstractVector, p::AbstractArray;
return q
end

quantile!(v::AbstractVector, p::AbstractArray; sorted::Bool=false) =
quantile!(similar(p,float(eltype(v))), v, p; sorted=sorted)
function quantile!(v::AbstractVector, p::Union{AbstractArray, Tuple{Vararg{Real}}};
sorted::Bool=false)
if !isempty(p)
minp, maxp = extrema(p)
_quantilesort!(v, sorted, minp, maxp)
end
return map(x->_quantile(v, x), p)
end

quantile!(v::AbstractVector, p::Real; sorted::Bool=false) =
_quantile(_quantilesort!(v, sorted, p, p), p)

function quantile!(v::AbstractVector, p::Tuple{Vararg{Real}}; sorted::Bool=false)
isempty(p) && return ()
minp, maxp = extrema(p)
_quantilesort!(v, sorted, minp, maxp)
return map(x->_quantile(v, x), p)
end

# Function to perform partial sort of v for quantiles in given range
function _quantilesort!(v::AbstractArray, sorted::Bool, minp::Real, maxp::Real)
isempty(v) && throw(ArgumentError("empty data vector"))
Expand Down
10 changes: 9 additions & 1 deletion stdlib/Statistics/test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -474,8 +474,16 @@ end
@test quantile([0,1],1e-18) == 1e-18
@test quantile([1, 2, 3, 4],[]) == []
@test quantile([1, 2, 3, 4], (0.5,)) == (2.5,)
@test quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11], (0.1, 0.2, 0.4, 0.9)) == (2.0, 3.0, 5.0, 11.0)
@test quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
(0.1, 0.2, 0.4, 0.9)) == (2.0, 3.0, 5.0, 11.0)
@test quantile(Union{Int, Missing}[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
[0.1, 0.2, 0.4, 0.9]) == [2.0, 3.0, 5.0, 11.0]
@test quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
[0.1, 0.2, 0.4, 0.9]) == [2.0, 3.0, 5.0, 11.0]
@test quantile([1, 2, 3, 4], ()) == ()
@test isempty(quantile([1, 2, 3, 4], Float64[]))
@test quantile([1, 2, 3, 4], Float64[]) isa Vector{Float64}
@test quantile([1, 2, 3, 4], []) isa Vector{Any}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth testing with p::Vector{Any} when it is non-empty, since we allow for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added two more tests. What is relevant here is that using map we have some type instability (this problem was present with tuple version already) - I expose this instability in tests.

Here is the problem (I show it on tuple which works the same before and after this PR):

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2.0, 3.0, 5.0, 11.0)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.9))
(2, 3, 5, 11)

julia> quantile([4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2.0, 3.0, 5.0, 16.400000000000002)

julia> quantile(Any[4, 9, 1, 5, 7, 8, 2, 3, 5, 17, 11],
                          (0.1, 0.2, 0.4, 0.99))
(2, 3, 5, 16.400000000000002)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's annoying. Maybe change:

T = promote_type(eltype(v), typeof(v[1]*h))

to

T = promote_type(typeof(v[1]), typeof(v[1]*h))

Or maybe we can just do T = typeof(v[1]*h)? What's the point of using promotion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both proposals are not ideal as v[1] may have a type unrelated to the type we actually need, e.g. for Vector{Real} with heterogeneous inputs. I have proposed something else that I think better captures what we want - it always combines the types of elements required and the type of quantile required.

Note that in corner cases of 0 or 1 as quantile we still get an integer:

julia> quantile([1,2,3], 1)
3

julia> quantile([1,2,3], 0)
1

but I think it is OK to leave it as is (we could enforce these cases to produce a float result, but I am not sure if we should, as in general we do not guarantee that quantile returns an AbstractFloat).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Just to be sure: when you say we get an integer for 0 and 1, that's not the case with 0.0 and 1.0, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Simply only if you pass an integer as a quantile (which is possible if the quantile is equal to 0 or 1). The key line is f0 = (lv - 1)*p with evaluates to an integer if p is an integer, and this propagates (as we only have additions and multiplications later).


@test_throws ArgumentError quantile([1, missing], 0.5)
@test_throws ArgumentError quantile([1, NaN], 0.5)
Expand Down