Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Tuples of array indices #74

Merged
merged 58 commits into from
Aug 26, 2020

Conversation

asmeurer
Copy link
Member

@asmeurer asmeurer commented Aug 7, 2020

This is needed for as_subindex support for those indices, which is the main thing we are missing, since an as_subindex result sometimes would be a tuple of arrays.

…edError

These only work in NumPy as fancy indices. However, there is too much of a
risk of API confusion in thinking that Tuple() works like Tuple((1, 2, 3))
instead of Tuple(1, 2, 3). So we make this an unconditional error with helpful
error messages. Fancy indices should use either a list or array.

c.f. issue Quansight-Labs#17.
@scopatz scopatz added this to the August 2020 milestone Aug 12, 2020
The shape an array indexed by an integer array is the product of the array and
index shapes. If both are large, this can produce a very large resulting array
which makes the test run too slow (hypothesis deadline exceeded) and use too
much memory.
…nstructors

This also fixes ndindex(boolean_scalar) to correctly give a BooleanArray
object instead of an Integer object.
Technically for integer scalars ndindex() will return an Integer(), rather
than IntegerArray(), so it could vary in some tests if this is tested of not.
However, the behavior should be identical for Integer, IntegerArray(integer
scalar), and IntegerArray(shape () integer array) in all cases.
This adds a new helper function operator_index() that works like
operator.index() except it disallows booleans. The built-in bool works with
operator.index, but np.bool_ gives a deprecation warning, so according to our
policy of making errors out of things that give deprecation warnings in NumPy,
we make this give an error. This affects Integer(), Slice(), and asshape().

A slight compatibility break here against NumPy is that NumPy actually does
allow booleans in slices. However, I'm not too worried about this as it's poor
form to do that, and most likely would indicate a bug in user code.
…hat don't return an index

Not only is this cleaner, as we don't have to "workaround" the function to
pretend that it is testing an index, it fixes an issue where the previous way
was not actually testing if a function raised an exception properly, because
it would just be indexed anyway after calling the function. This affected
newshape() and isempty() tests.
If an array index is created from another array index, this should be done.
This is especially useful when creating a new array index out after
broadcasting an existing array index, since broadcasting creates a memory
efficient readonly view.
So far we disallow any other indices in the tuple.

This also makes it so that Tuple.expand() broadcasts any arrays together.
Any slices, ellipses, or newaxes must not be between the arrays, or it raises
NotImplementedError. This will probably never be implemented, because it's a
bit of a wart case in the indexing API, which should raise an error (or at
least that's what Travis told me to do).

Also makes Tuple.expand() convert Integers into IntegerArrays as arrays when
the tuple also contains arrays (so that they will also be broadcast).
This adds a private _axis keyword argument to newshape() methods. This is
needed to keep the NumPy 1.19 behavior of not raising IndexError on out of
bounds indices on empty arrays in some cases. This is needed because the
specific cases when this happens require knowing the entire shape, but
previously we just passed through idx.newshape(shape[i]) in Tuple.newshape().
This behavior will be deprecated in NumPy 1.20, at which point we will stop
supporting it (breaking support earlier is hard because we cannot explicitly
test against NumPy without a deprecation warning to catch). When this happens,
the _axis argument to the newshape() methods will be removed and we can go
back to just passing through shape[i] as before.
Now that Tuple.newshape() calls expand() instead of reduce(), it does not need
to handle ellipses.
The test_ndindex_expand_hypothesis() already tests tuples because the
ndindices strategy generates tuples, and it already tests all the same things
that the test_tuple_expand_hypothesis test tests.
@asmeurer
Copy link
Member Author

This can be reviewed. I'll probably keep pushing stuff here. Right now only integer array indices are supported in tuples, but I will also add boolean arrays soon if it isn't too much work. Otherwise I'll do it in a separate PR.

@@ -27,8 +27,9 @@ install:
- conda config --add channels conda-forge
- conda update -q conda
- conda info -a
- conda create -n test-environment python=$TRAVIS_PYTHON_VERSION pyflakes pytest pytest-doctestplus numpy sympy hypothesis doctr sphinx myst-parser sphinx_rtd_theme pytest-cov pytest-flakes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as though there are still some numpy imports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy is still a dependency. I'm just installing it from git now.

try:
return BooleanArray(obj)
except TypeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has got to be a better way to do this than try-excepts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I only want to have the logic in once place, in the object constructor.

So far only the Tuple constructor an expand() are implemented properly.
There are some weird semantics with them where they sort of act like shape ()
arrays and sort of don't. It isn't that important to support right now, so
we'll just leave it unimplemented.
The logic isn't correct in the case where boolean scalars are mixed with array
indices, but this is currently disabled in the Tuple constructor.
Nonscalar BooleanArrays are converted to nonzero() in the arrays list, so
counting their dimensions is pointless.
…shape

This is achieved with the shared() strategy. Without this, boolean arrays
almost always don't match the array shape where they are indexed, except in
the scalar boolean case, meaning almost no boolean array behavior was properly
tested outside of IndexError behavior. We do still include random shapes to
test this behavior, as it doesn't reduce the number of good examples by much.

Fixes part of issue Quansight-Labs#77. See the discussion there for more information.
@scopatz scopatz merged commit d15a974 into Quansight-Labs:master Aug 26, 2020
@scopatz
Copy link
Contributor

scopatz commented Aug 26, 2020

Amazing work here @asmeurer!

@ericdatakelly ericdatakelly modified the milestones: August 2020, September 2020 Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants