Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimum(z) gives TypeError: non-boolean (Missing) used in boolean context if z is large and contains missings #35939

Closed
Datseris opened this issue May 19, 2020 · 6 comments
Labels
missing data Base.missing and related functionality

Comments

@Datseris
Copy link

This works:

julia> test = ones(Union{Float64, Missing}, 5,5,5)
julia> test[:, 1:3, 1] .= missing

minimum(test)

returns missing.

This doesn't work:

x = [rand() < .5 ? missing : rand() for i in 1:100, j in 1:100]
minimum(x)
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] mapreduce_impl(::typeof(identity), ::typeof(min), ::Array{Union{Missing, Float64},2}, ::Int64, ::Int64) at .\reduce.jl:572
 [2] _mapreduce(::typeof(identity), ::typeof(min), ::IndexLinear, ::Array{Union{Missing, Float64},2}) at .\reduce.jl:407
 [3] _mapreduce_dim at .\reducedim.jl:312 [inlined]
 [4] #mapreduce#580 at .\reducedim.jl:307 [inlined]
 [5] mapreduce at .\reducedim.jl:307 [inlined]
 [6] _minimum at .\reducedim.jl:657 [inlined]
 [7] _minimum at .\reducedim.jl:656 [inlined]
 [8] #minimum#589 at .\reducedim.jl:652 [inlined]
 [9] minimum(::Array{Union{Missing, Float64},2}) at .\reducedim.jl:652
 [10] top-level scope at none:0

Me and Peter Deffebach were trying to come up with a MWE. The process made me think that the problem depends on the size of the array. The problem persists for 3 dimensional arrays as well, so I assume it is independent of array dimension.

I'm on Julia 1.4.0, Windows 10.

@Datseris
Copy link
Author

cc @pdeffebach

@pdeffebach
Copy link
Contributor

I'm on 1.4.0, Ubuntu 19.10.

This is a weird bug. I would need to work a bit more to figure out if it's the absolute number of missings that matter or the size of the array.

julia> function test(n, p)
       x = [rand() < p ? missing : rand() for i in 1:n, j in 1:n]
       minimum(x)
       end

julia> for i in 1:10
       @show test(20, .01)
       end
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = 0.0014324455133143399
test(20, 0.01) = missing
test(20, 0.01) = missing
test(20, 0.01) = missing
julia> for i in 1:10
       @show test(20, .9)
       end
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] mapreduce_impl(::typeof(identity), ::typeof(min), ::Array{Union{Missing, Float64},2}, ::Int64, ::Int64) at ./reduce.jl:572
julia> for i in 1:10
       @show test(30, .01)
       end
ERROR: TypeError: non-boolean (Missing) used in boolean context

@pdeffebach
Copy link
Contributor

The line that throws the error is here:

v1 == v1 || return v1

@mbauman mbauman added the missing data Base.missing and related functionality label May 19, 2020
@pdeffebach
Copy link
Contributor

pdeffebach commented May 19, 2020

Here is a full MWE to confirm. If an array is large enough that mapreduce_impl decides to batch the computation.

When it batches the computation, 4 values v1, v2., v3, v4 from a reduce on the elements from the previous batch. Corresponding to every 4i + 1, 4i + 2, indices i = 0, 1, .... If it's the first batch, it just takes the first element.

Then it checks that none of these v values are NaN. If there is a NaN, it returns NaN. However it does this check by doing v1 == v1, v2 == v2 etc. This will only return false for NaN but will throw an error with missing.

The reason this error is not deterministic is because it only performs this check on the first 4 values of each batch. When the array is small, there is only one batch, so there is only an error if the first value is missing.

If the array is large, then the v1, v2 etc. will of course be missing if any missing appears in that previous batch.

I confirm this with a new test function

julia> function test2(n)
       x = [i == j == 1 ? missing : rand() for i in 1:n, j in 1:n]
       minimum(x)
       end
test2 (generic function with 2 methods)

julia> test2(16)
missing

julia> test2(17)
ERROR: TypeError: non-boolean (Missing) used in boolean context

@andyferris
Copy link
Member

The line that throws the error is here:

v1 == v1 || return v1

Oh... that function is not generic! One could try (v1 == v1) === true || return v1? It would kind of work for NaN and missing and might generate identical code, but for certain functions op it's not even correct to abort! (I have used mapreduce with complex op before).

Is the short circuit really that important? Would people be relying on this optimization in their hot path?

@sostock
Copy link
Contributor

sostock commented Jan 25, 2022

This was fixed in #35989 (Julia 1.6).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

No branches or pull requests

5 participants