Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mini Julep: skipmissing indexing #30606

Open
nalimilan opened this issue Jan 5, 2019 · 6 comments
Open

Mini Julep: skipmissing indexing #30606

nalimilan opened this issue Jan 5, 2019 · 6 comments
Labels
julep Julia Enhancement Proposal missing data Base.missing and related functionality needs decision A decision on this change is needed

Comments

@nalimilan
Copy link
Member

nalimilan commented Jan 5, 2019

This mini Julep aims to address issues which currently block progress regarding two essential use cases of skipmissing. The first one is how to compute reductions over dimensions of an array, skipping missing values (#28027). The other is how to find the index of the maximum/minimum value in an array, skipping missing values (#29305).

The solution I suggest is to make SkipMissing an "enhanced iterator" which would also support indexing. More specifically, it would use the same indices as the original array:

  • keys(itr::SkipMissing) would return (i for i in keys(itr.x) if !ismissing(itr.x[i])), or an equivalent iterator.
  • getindex using indices of the original array would either return the corresponding value, or throw an error if the value is missing: getindex(itr::SkipMissing, i...) = (v = itr.x[i...]; ismissing(v) ? throw(...) : v)

Consistently with this proposal, argmax(skipmissing(x)) would return the index i so that x[i] is the highest non-missing value in x (fixing #29305). And reduce(skipmissing(x), +, dims=i) would compute the sum of non-missing values over dimension i of x, with the same shape as reduce(x, +, dims=i) (fixing #28027). In both cases, if x contains no missing values, the result would be indistinguishable from applying the operation directly to x.

IteratorSize(SkipMissing) would still return SizeUnknown, since computing the length is an O(N) operation.

@nalimilan nalimilan added needs decision A decision on this change is needed julep Julia Enhancement Proposal missing data Base.missing and related functionality labels Jan 5, 2019
@tkoolen

This comment has been minimized.

@nalimilan

This comment has been minimized.

@StefanKarpinski
Copy link
Member

So essentially this makes SkipMissing into a general wrapper which behaves like the wrapped collection but indicates to any functions operating on it that they should ignore missings rather than treating them as poisonous, which is the default.

@nalimilan
Copy link
Member Author

It would behave as much as possible like the wrapped collection, but the behaviours would be identical only in the absence of missing values. When there are missing values, keys would skip some indices, and getindex would throw an error accessing them. There would still be other differences, e.g. collect cannot return a multidimensional array since skipping missing values would introduce "empty" entries.

@bramtayl

This comment has been minimized.

@KristofferC

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
julep Julia Enhancement Proposal missing data Base.missing and related functionality needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

5 participants