-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specifications for functions for searching arrays #23
Conversation
NumPy discussion where there's agreement it's a good idea, just work to implement: numpy/numpy#8710. And since
Here's the PyTorch feature request: pytorch/pytorch#32570. Seems to be a bit of work, but should be doable. With both of the above decisions we'd be deviating from choosing the minimal common set, however there's also something to say for consistency between functions. E.g. |
The
I suggest saying 0-D input should raise an exception here. It clearly doesn't make too much sense.
I think the |
+1 for accepting only boolean arrays. The note in the
I'd say
+1, the 1-arg form doesn't make any sense. |
@rgommers Thanks for the initial review, responses, and links to relevant discussions! Re: argmax/argmin Re: argmax/argmin ...which leads me back to the question:
Re: Re: Re: |
This looks about ready to merge, I just need to re-review to see all discussion points were addressed. |
The reason apparently was that it's easier for advanced indexing (you can then simply do |
Double checked everything, LGTM. Merged, thanks @kgryte |
The tuple form for nonzero allows boolean array indices to be equivalent to nonzero (except for scalar booleans), so |
A little more archeology: http://numpy-discussion.10968.n7.nabble.com/just-curious-why-does-numpy-where-return-tuples-td14182.html#a14183 (2008): I assume that you are talking about where()'s single-argument form In [4]: x = random.randint(0, 2, [3, 10])
In [5]: x
Out[5]:
array([[1, 0, 0, 0, 0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1, 1, 0, 0, 1],
[1, 0, 0, 1, 0, 1, 0, 1, 0, 1]])
In [6]: nonzero(x)
Out[6]:
(array([0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]),
array([0, 6, 7, 0, 1, 4, 5, 6, 9, 0, 3, 5, 7, 9]))
In [7]: x[nonzero(x)]
Out[7]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) In Python, In [8]: x[array(nonzero(x))]
---------------------------------------------------------------------------
IndexError |
@rgommers Thanks for digging this up. :) |
Did we add |
@jakirkham |
Ah ok. It would be useful to have Since these are functions that can have unknown array lengths in their results (in Dask or other lazy computation libraries we don't know how many non-zeros there are until computed), it is helpful to have all of the indices in a single array like in |
Argh, why do we have both PyTorch has an This may be worth considering. The name is a little annoying though. Especially because >>> x = np.array([0, 4, 7, 0])
>>> np.nonzero(x)
(array([1, 2]),)
>>> np.where(x)
(array([1, 2]),)
>>> np.argwhere(x)
array([[1],
[2]])
>>> x[np.nonzero(x)]
array([4, 7])
>>> x[np.where(x)]
array([4, 7])
>>> x[np.argwhere(x)] # this needs a squeeze() to roundtrip for 1-D input
array([[4],
[7]]) |
Apologies for the lack of follow up here. Have tried to articulate why |
This PR
Notes
nanargmax
,nanargmin
,argwhere
,flatnonzero
,extract
,searchsorted
); however, these functions are not widely implemented by other analyzed array libraries (less than half; see here) or are out-of-scope (nan
variants) and, thus, were not included in this initial specification. Should additional search functions be necessary, they can be proposed in follow-up proposals.Questions (and notes)
argmax
max
in that multiple axes cannot be specified (CuPy supports, but no one else). Commonly, when computing the maximum value, array libraries allow users to reduce over multiple axes when performing the statistical "reduction". However, apart from CuPy, most of the analyzed array libraries do not seem to allow the same when searching for the indices of the maximum values. Should this be reconciled?keepdims
which we support here in order to ensure broadcast compatibility as we do with statistical functions.max
et al, should we also support anout
keyword argument? This is supported by NumPy et al, but not Torch and TensorFlow.argmin
argmax
.nonzero
1d
, while Torch supports handling of zero-dimensional arrays as if one-dimensional.as_tuple
keyword argument to either return a tuple of arrays or a single multi-dimensional array. Not clear on why most other array libraries insist on a tuple and not an ndarray (maybe because of NumPy)?where
0
is falsy while non-zero values are truthy). Should we be stricter here?out
keyword argument? Would seem consistent with other (element-wise) functions.condition
argument, as array library docs consistently point to alternative approaches (e.g.,np.asarray(condition).nonzero()
).