Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spreadmissings #122

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Changes from 7 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 100 additions & 8 deletions src/Missings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ module Missings

export allowmissing, disallowmissing, ismissing, missing, missings,
Missing, MissingException, levels, coalesce, passmissing, nonmissingtype,
skipmissings
skipmissings, spreadmissings

using Base: ismissing, missing, Missing, MissingException
using Base: @deprecate
Expand Down Expand Up @@ -208,6 +208,98 @@ missing
"""
passmissing(f) = PassMissing{Core.Typeof(f)}(f)

struct SpreadMissings{F} <: Function
f::F
end

function (f::SpreadMissings{F})(xs...; kwargs...) where {F}
if any(x -> x isa AbstractVector{>:Missing}, xs)
vecs = Base.filter(x -> x isa AbstractVector, xs)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

findex = eachindex(first(vecs))
@assert all(x -> eachindex(x) == findex, vecs)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

nonmissingmask = fill(true, length(vecs[1]))
for v in vecs
nonmissingmask .&= .!ismissing.(v)
nalimilan marked this conversation as resolved.
Show resolved Hide resolved
end

vecs_counter = 1
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
newargs = ntuple(length(xs)) do i
if xs[i] isa AbstractVector
t = view(xs[i], nonmissingmask)
vecs_counter += 1
else
t = xs[i]
end
t
end

res = f.f(newargs...; kwargs...)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

if res isa AbstractVector
out = similar(res, Union{eltype(res), Missing}, length(vecs[1]))
fill!(out, missing)
out[nonmissingmask] .= res
else
out = similar(vecs[1], Union{typeof(res), Missing})
fill!(out, missing)
out[nonmissingmask] .= Ref(res)
end

return out
else
return f.f(xs...; kwargs...)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about throwing an error in this case for now? It doesn't seem very useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I disagree. I want people to be able to code defensively with this tool. throwing an error would defeat the purpose a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Something that occurred to me though: should we do something special if one of the non-vector arguments is missing (like returning missing)? Should we reserve this behavior for the future in case we want spreadmissings to be a more general version of passmissing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We definitely can't return missing, we would have to return a vector of missings, which seems very restrictive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "restrictive" means here. That would be similar to e.g. broadcast(+, missing, [1, 2, 3]).

Whether that makes sense depends on what f does with its inputs, but I can't find examples of functions to which one would pass one or more vectors plus a missing argument and would not expect the resulting vector to be full of missings. So reserving the situation where one argument is missing could be a good idea in case we want to handle it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think? At least throwing an error if any of the inputs is missing for now doesn't seem too problematic?

end

"""
spreadmissings(f)

Given a function `f`, wraps a function `f` but performs a transformation
on arguments before executing. Given the call

```
spreadmissings(f)(x::AbstractVector, y::Integer, z::AbstractVector)
```

will construct the intermedaite variables

```
sx, sy = skipmissings(x, y)
```

and call

```
f(sx, y, sy)

```
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

# Examples
```
julia> x = [0, 1, 2, missing]; y = [-1, 0, missing, 2];

julia> function restricted_fun(x, y)
map(x, y) do xi, yi
if xi < 1 || yi < 1 # will error on missings
return 1
else
return 2
end
end
end;

julia> spreadmissings(restricted_fun)(x, y)
4-element Vector{Union{Missing, Int64}}:
1
1
missing
missing
```
"""
spreadmissings(f) = SpreadMissings{Core.Typeof(f)}(f)

"""
skipmissings(args...)

Expand Down Expand Up @@ -258,7 +350,7 @@ struct SkipMissings{V, T}
others::T
end

Base.@propagate_inbounds function _anymissingindex(others::Tuple{Vararg{AbstractArray}}, i)
Base.@propagate_inbounds function _anymissingindex(others::Tuple{Vararg{AbstractArray}}, i)
for oth in others
oth[i] === missing && return true
end
Expand All @@ -267,7 +359,7 @@ Base.@propagate_inbounds function _anymissingindex(others::Tuple{Vararg{Abstract
end

@inline function _anymissingiterate(others::Tuple, state)
for oth in others
for oth in others
y = iterate(oth, state)
y !== nothing && first(y) === missing && return true
end
Expand All @@ -278,7 +370,7 @@ end
const SkipMissingsofArrays = SkipMissings{V, T} where
{V <: AbstractArray, T <: Tuple{Vararg{AbstractArray}}}

function Base.show(io::IO, mime::MIME"text/plain", itr::SkipMissings{V}) where V
function Base.show(io::IO, mime::MIME"text/plain", itr::SkipMissings{V}) where V
print(io, SkipMissings, '{', V, '}', '(', itr.x, ')', " comprised of " *
"$(length(itr.others) + 1) iterators")
end
Expand Down Expand Up @@ -335,7 +427,7 @@ end
@inline function Base.getindex(itr::SkipMissingsofArrays, i)
@boundscheck checkbounds(itr.x, i)
@inbounds xi = itr.x[i]
if xi === missing || @inbounds _anymissingindex(itr.others, i)
if xi === missing || @inbounds _anymissingindex(itr.others, i)
throw(MissingException("the value at index $i is missing for some element"))
end
return xi
Expand Down Expand Up @@ -380,9 +472,9 @@ Base.mapreduce_impl(f, op, A::SkipMissingsofArrays, ifirst::Integer, ilast::Inte
A = itr.x
if ifirst == ilast
@inbounds a1 = A[ifirst]
if a1 === missing
if a1 === missing
return nothing
elseif _anymissingindex(itr.others, ifirst)
elseif _anymissingindex(itr.others, ifirst)
return nothing
else
return Some(Base.mapreduce_first(f, op, a1))
Expand Down Expand Up @@ -436,7 +528,7 @@ end
Return a vector similar to the array wrapped by the given `SkipMissings` iterator
but skipping all elements with a `missing` value in one of the iterators passed
to `skipmissing` and elements for which `f` returns `false`. This method
only applies when all iterators passed to `skipmissings` are arrays.
only applies when all iterators passed to `skipmissings` are arrays.

# Examples
```
Expand Down