-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Tuples of array indices #74
Conversation
…edError These only work in NumPy as fancy indices. However, there is too much of a risk of API confusion in thinking that Tuple() works like Tuple((1, 2, 3)) instead of Tuple(1, 2, 3). So we make this an unconditional error with helpful error messages. Fancy indices should use either a list or array. c.f. issue Quansight-Labs#17.
The shape an array indexed by an integer array is the product of the array and index shapes. If both are large, this can produce a very large resulting array which makes the test run too slow (hypothesis deadline exceeded) and use too much memory.
…nstructors This also fixes ndindex(boolean_scalar) to correctly give a BooleanArray object instead of an Integer object.
Technically for integer scalars ndindex() will return an Integer(), rather than IntegerArray(), so it could vary in some tests if this is tested of not. However, the behavior should be identical for Integer, IntegerArray(integer scalar), and IntegerArray(shape () integer array) in all cases.
This adds a new helper function operator_index() that works like operator.index() except it disallows booleans. The built-in bool works with operator.index, but np.bool_ gives a deprecation warning, so according to our policy of making errors out of things that give deprecation warnings in NumPy, we make this give an error. This affects Integer(), Slice(), and asshape(). A slight compatibility break here against NumPy is that NumPy actually does allow booleans in slices. However, I'm not too worried about this as it's poor form to do that, and most likely would indicate a bug in user code.
…hat don't return an index Not only is this cleaner, as we don't have to "workaround" the function to pretend that it is testing an index, it fixes an issue where the previous way was not actually testing if a function raised an exception properly, because it would just be indexed anyway after calling the function. This affected newshape() and isempty() tests.
If an array index is created from another array index, this should be done. This is especially useful when creating a new array index out after broadcasting an existing array index, since broadcasting creates a memory efficient readonly view.
So far we disallow any other indices in the tuple. This also makes it so that Tuple.expand() broadcasts any arrays together.
Any slices, ellipses, or newaxes must not be between the arrays, or it raises NotImplementedError. This will probably never be implemented, because it's a bit of a wart case in the indexing API, which should raise an error (or at least that's what Travis told me to do). Also makes Tuple.expand() convert Integers into IntegerArrays as arrays when the tuple also contains arrays (so that they will also be broadcast).
This adds a private _axis keyword argument to newshape() methods. This is needed to keep the NumPy 1.19 behavior of not raising IndexError on out of bounds indices on empty arrays in some cases. This is needed because the specific cases when this happens require knowing the entire shape, but previously we just passed through idx.newshape(shape[i]) in Tuple.newshape(). This behavior will be deprecated in NumPy 1.20, at which point we will stop supporting it (breaking support earlier is hard because we cannot explicitly test against NumPy without a deprecation warning to catch). When this happens, the _axis argument to the newshape() methods will be removed and we can go back to just passing through shape[i] as before.
Now that Tuple.newshape() calls expand() instead of reduce(), it does not need to handle ellipses.
The test_ndindex_expand_hypothesis() already tests tuples because the ndindices strategy generates tuples, and it already tests all the same things that the test_tuple_expand_hypothesis test tests.
This can be reviewed. I'll probably keep pushing stuff here. Right now only integer array indices are supported in tuples, but I will also add boolean arrays soon if it isn't too much work. Otherwise I'll do it in a separate PR. |
@@ -27,8 +27,9 @@ install: | |||
- conda config --add channels conda-forge | |||
- conda update -q conda | |||
- conda info -a | |||
- conda create -n test-environment python=$TRAVIS_PYTHON_VERSION pyflakes pytest pytest-doctestplus numpy sympy hypothesis doctr sphinx myst-parser sphinx_rtd_theme pytest-cov pytest-flakes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems as though there are still some numpy imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumPy is still a dependency. I'm just installing it from git now.
try: | ||
return BooleanArray(obj) | ||
except TypeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has got to be a better way to do this than try-excepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I only want to have the logic in once place, in the object constructor.
So far only the Tuple constructor an expand() are implemented properly.
There are some weird semantics with them where they sort of act like shape () arrays and sort of don't. It isn't that important to support right now, so we'll just leave it unimplemented.
The logic isn't correct in the case where boolean scalars are mixed with array indices, but this is currently disabled in the Tuple constructor.
Nonscalar BooleanArrays are converted to nonzero() in the arrays list, so counting their dimensions is pointless.
…shape This is achieved with the shared() strategy. Without this, boolean arrays almost always don't match the array shape where they are indexed, except in the scalar boolean case, meaning almost no boolean array behavior was properly tested outside of IndexError behavior. We do still include random shapes to test this behavior, as it doesn't reduce the number of good examples by much. Fixes part of issue Quansight-Labs#77. See the discussion there for more information.
Amazing work here @asmeurer! |
This is needed for as_subindex support for those indices, which is the main thing we are missing, since an as_subindex result sometimes would be a tuple of arrays.