Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Python] __getitem__ copies too often #13166

Closed
kohr-h opened this issue Nov 7, 2018 · 1 comment · Fixed by #13143
Closed

[Python] __getitem__ copies too often #13166

kohr-h opened this issue Nov 7, 2018 · 1 comment · Fixed by #13143

Comments

@kohr-h
Copy link
Contributor

kohr-h commented Nov 7, 2018

Description

While working on #13134 I noticed that NDArray.__getitem__ does not really do a good job of determining whether or not a view instead of a copy can be returned.

For instance, wrapping any indexing object into a 1-element tuple should have no effect at all (at least if one wants to be consistent with NumPy), but the wrapping currently leads to a copy.
Or, indexing a 3D array as array[0] should be equivalent to array[0, :, :], but the latter copies while the former does not.

Here are some examples:

def getitem_returns_view(x, idx):
    tmp = x.copy()
    tmp[:] = 0
    y = tmp[idx]
    y[:] = 1
    return all(yi == xi for yi, xi in zip(y.reshape((-1)), tmp[idx].reshape((-1,))))

x = nd.zeros((3, 4, 5))

## Integers

# Okay
getitem_returns_view(x, 0)  # True
getitem_returns_view(x.asnumpy(), 0)  # True

# NOT okay
getitem_returns_view(x, (0,))  # False
getitem_returns_view(x.asnumpy(), (0,))  # True

# Same with 1 instead of 0

## Slices

# Okay
getitem_returns_view(x, slice(None))  # True
getitem_returns_view(x.asnumpy(), slice(None))  # True

# NOT okay
getitem_returns_view(x, (slice(None),))  # False
getitem_returns_view(x.asnumpy(), (slice(None),))  # True

# NOT okay
getitem_returns_view(x, (0, slice(None)))  # False
getitem_returns_view(x.asnumpy(), (0, slice(None)))  # True

# NOT okay
getitem_returns_view(x, (0, slice(None), slice(None)))  # False
getitem_returns_view(x.asnumpy(), (0, slice(None), slice(None)))  # True

# NOT at all okay
getitem_returns_view(x, (slice(None), slice(None), slice(None)))  # False
getitem_returns_view(x.asnumpy(), (slice(None), slice(None), slice(None)))  # True

In particular the last case array[:, :, :], which is explicitly everything, should definitely not copy.

Suggestion

Instead of the current way of special-casing integers, lists and slice objects, I think it would be more robust to always sanitize the indexing object, in particular:

  • If it's not a tuple, wrap it in a 1-tuple.
  • Expand it to length array.ndim by adding slice(None) to the right.

Then the subsequent logic can look at all the entries in the tuple and determine whether the result will be contiguous (is it correct that the backend does not yet support strided views?). That's not entirely trivial but certainly doable.

Of course, things get a little bit more involved if None (new axis) and Ellipsis (...) are allowed as well. But the code in #13143 already does the above sanitization for that case, it just doesn't always use it to avoid copies.

@leleamol
Copy link
Contributor

leleamol commented Nov 8, 2018

@kohr-h Thank you very much for submitting this issue and suggesting the solutions. I would highly encourage to submit a PR for the fix.
Meanwhile I will label the issue so that MXNet community can help in resolving this issue.

@mxnet-label-bot [Python, NDArray, Bug]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants