Add a document on type confusion

c.f. issue Quansight-Labs#17.
asmeurer · Jul 31, 2020 · d8f8bc6 · d8f8bc6
1 parent 8ec09f0
commit d8f8bc6
Show file tree

Hide file tree

Showing 5 changed files with 293 additions and 1 deletion.
diff --git a/docs/index.md b/docs/index.md
@@ -239,6 +239,7 @@ MIT License
 index.md
 api.md
 slices.md
+type-confusion.md
 changelog.md
 style-guide.md
 ```
diff --git a/docs/type-confusion.md b/docs/type-confusion.md
@@ -0,0 +1,270 @@
+(type-confusion)=
+Type Confusion
+==============
+
+When using the ndindex API, it is important to avoid type confusion. Many
+types that are used as indices for arrays also have semantic meaning outside
+of indexing. For example, tuples and arrays mean one thing when they are
+indices, but they are also used in contexts that have nothing to do with indexing.
+
+ndindex classes have names that are based on the native class names for the
+index type they represent. One must be careful, however, to not confuse these
+classes with the classes they represent. Most methods that work on the native
+classes are not available on the ndindex classes.
+
+Some general types to help avoid type confusion:
+
+- **Always use the [`ndindex()`](ndindex.ndindex) function to create ndindex
+  types.** When calling ndindex methods or creating {ref}`Tuple` objects, it
+  is not necessary to convert arguments to ndindex types first. Slice literals
+  (using `:`) are not valid syntax outside of a getitem (square brackets), but
+  you can use the `slice` built-in object to create slices. `slice(a, b, c)`
+  is the same as `a:b:c`.
+
+  **Right:**
+
+  ```py
+  idx.as_subindex((1,))
+  ```
+
+  **Wrong:**
+
+  ```
+  idx.as_subindex(Tuple(Integer(1))) # More verbose than necessary
+  ```
+
+
+- **Use `.raw` to convert an ndindex object to an indexable type.** With the
+  exception of `Integer`, it is impossible for custom types to define
+  themselves as indices to NumPy arrays, so it is necessary to use
+  `a[idx.raw]` rather than `a[idx]` when `idx` is an ndindex type. Since
+  native index types do not have a `.raw` method, it is recommended to always
+  keep any index object that you are using as an ndindex type, and use `.raw`
+  only when you need to use it as an index. If you get the error "``IndexError:
+  only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and
+  integer or boolean arrays are valid indices``", it indicates you forgot to
+  use `.raw`.
+
+  **Right:**
+
+  ```py
+  a[idx.raw]
+  ```
+
+  **Wrong:**
+
+  ```py
+  a[idx] # Gives an error
+  ```
+
+- **Only use ndindex classes for objects that represent indices.** Do not use
+  classes like `Integer`, `Tuple`, `IntegerArray`, or `BooleanArray` unless
+  the object in question is going to be an index to an array. For example,
+  array shapes are always tuples of integers, but they are not indices, so
+  `Tuple` should not be used to represent an array shape, but rather just a
+  normal `tuple`. If an object will be used both as an index and an
+  integer/tuple/array in its own right, either make use of `.raw` when using
+  it in non-index contexts, or store the objects separately (note that
+  `IntegerArray` and `BooleanArray` always make a copy of the input argument,
+  so there is no issue with managing it separately).
+
+  **Right:**
+
+  ```py
+  np.empty((1, 2, 3))
+  # idx is an ndindex object
+  idx.newshape((1, 2, 3))
+  ```
+
+  **Wrong:**
+
+  ```py
+  np.empty(Tuple(1, 2, 3)) # gives an error
+  # idx is an ndindex object
+  idx.newshape(Tuple(1, 2, 3)) # gives an error
+  ```
+
+- **Try to use ndindex methods to manipulate indices.** The whole reason
+  ndindex exists is that writing formulas for manipulating indices is hard,
+  and it's easy to get the corner cases wrong. If you find yourself
+  manipulating index args directly in complex ways, it's a sign you should
+  probably be using a higher level abstraction. If what you are trying to do
+  doesn't exist yet, [open an
+  issue](https://github.com/Quansight/ndindex/issues) so we can implement it.
+
+Additionally, some advice for specific types:
+
+## Integer
+
+- **{any}`Integer` should not be thought of as an int type.** It represents integers
+  **as indices**. It is not usable in contexts where integers are usable. For
+  example, arithmetic will not work on it. If you need to manipulate the
+  integer index as an integer, use `idx.raw`.
+
+  **Right:**
+
+  ```py
+  # idx is an Integer
+  idx.raw + 1
+  ```
+
+  **Wrong:**
+
+  ```py
+  # idx is a Tuple
+  idx.raw + 1 # Produces an error
+  ```
+
+- `Integer` is the only index type that can be used directly as an array
+  index. This is because the `__index__` API allows custom integer objects to
+  define themselves as indices. However, this API does not extend to other
+  index types like slices or tuples. **It is recommended to always use
+  `idx.raw` even if `idx` is an `Integer`**, so that it will also work even if
+  it is another index type. You should not rely on any ndindex function
+  returning a specific index type.
+
+## Tuple
+
+- **{any}`Tuple` should not be thought of as a tuple.** In particular, things like
+  `idx[0]` and `len(idx)` will not work if `idx` is a `Tuple`. If you need to
+  access the specific term in a `Tuple`, use `Tuple.args`.
+
+  **Right:**
+
+  ```py
+  # idx is a Tuple
+  idx.raw[0]
+  ```
+
+  **Wrong:**
+
+  ```py
+  # idx is a Tuple
+  idx[0] # Produces an error
+  ```
+
+- `Tuple` is defined as `Tuple(*args)`.
+
+   **Right:**
+
+   ```py
+   Tuple(0, 1, 2)
+   ```
+
+   **Wrong:**
+
+   ```py
+   Tuple((0, 1, 2)) # Gives an error
+   ```
+
+## ellipsis
+
+- You should almost never use the ndindex {any}`ellipsis` class directly.
+  Instead, **use `...` or `ndindex(...)`**. As noted above, all ndindex
+  methods and `Tuple` will automatically convert `...` into the ndindex type.
+
+  **Right:**
+
+  ```py
+  idx1 = ndindex(...)
+  idx1.reduce()
+  ```
+
+  **Wrong:**
+
+  ```py
+  idx = ...
+  idx.reduce() # Gives an error
+  ```
+
+- If you do use `ellipsis` beware that it is the *class*, not the *instance*,
+  unlike the built-in `Ellipsis` object. This is done for consistency in the
+  internal ndindex class hierarchy.
+
+  **Right:**
+
+  ```py
+  idx1 = ndindex((0, ..., 1))
+  idx1.reduce()
+  ```
+
+  **Wrong:**
+
+  ```py
+  idx = (0, ellipsis, 1)
+  idx.reduce() # Gives an error
+  ```
+
+  These do not give errors, but it is easy to confuse them with the above. It
+  is best to just use `...`, which is more concise and easier to read.
+
+  ```py
+  idx = (0, ellipsis(), 1)
+  idx.reduce()
+  ```
+
+  ```py
+  idx = (0, Ellipsis, 1)
+  idx.reduce()
+  ```
+
+- `ellipsis` is **not** singletonized, unlike the built-in `...`. It would
+  also be impossible to make `ellipsis() is ...` return True. If you are using
+  ndindex, **you should use `==` to compare against `...`**, and avoid using `is`.
+
+  **Right:**
+
+  ```py
+  if idx == ...:
+  ```
+
+  **Wrong:**
+
+  ```py
+  if idx is Ellipsis: # Will be False if idx is the ndindex ellipsis type
+  ```
+
+## IntegerArray and BooleanArray
+
+- **{any}`IntegerArray` and `BooleanArray` should not be thought of as
+  arrays.** They do not have the methods that `numpy.ndarray` would have. They
+  also have fixed dtypes (`intp` and `bool_`) and are restricted by what is
+  allowed as indices by NumPy.
+
+  **Right:**
+
+  ```py
+  idx = IntegerArray(array([0, 1]))
+  idx.array[0]
+  ```
+
+  **Wrong:**
+
+  ```py
+  idx = IntegerArray(array([0, 1]))
+  idx.[0] # Gives an error
+  ```
+
+- **Like all other ndindex types, `IntegerArray` and `BooleanArray` are
+  immutable.**. The `.array` object on them is set as read-only to enforce
+  this. To modify an array index, create a new object. All ndindex methods
+  that manipulate indices, like [reduce](NDIndex.reduce), return new objects.
+  If you create an `IntegerArray` or `BooleanArray` object out of an existing
+  array, the array is copied so that modifications to the original array do
+  not affect the ndindex objects.
+
+  **Right:**
+
+  ```py
+  idx = IntegerArray(array([0, 1]))
+  arr = idx.array.copy()
+  arr[0] = 1
+  idx2 = IntegerArray(arr)
+  ```
+
+  **Wrong:**
+
+  ```py
+  idx = IntegerArray(array([0, 1]))
+  idx.array[0] = 1 # Gives an error
+  ```
diff --git a/ndindex/integer.py b/ndindex/integer.py
@@ -20,6 +20,13 @@ class Integer(NDIndex):
     index directly. However, it is still recommended to use `raw` for
     consistency, as this only works for `Integer`.
 
+    .. note::
+
+       `Integer` does *not* represent an integer, but rather an
+       *integer index*. It does not have most methods that `int` has, and
+       should not be used in non-indexing contexts. See the document on
+       :ref:`type-confusion` for more details.
+
     """
     def _typecheck(self, idx):
         idx = operator.index(idx)

diff --git a/ndindex/integerarray.py b/ndindex/integerarray.py
@@ -5,7 +5,7 @@
 
 class IntegerArray(ArrayIndex):
     """
-    Represents an integer array.
+    Represents an integer array index.
 
     If `idx` is an n-dimensional integer array with shape `s = (s1, ..., sn)`
     and `a` is any array, `a[idx]` replaces the first dimension of `a` with
@@ -29,6 +29,13 @@ class IntegerArray(ArrayIndex):
     array([[0, 1],
            [1, 2]])
 
+    .. note::
+
+       `IntegerArray` does *not* represent an array, but rather an *array
+       index*. It does not have most methods that `numpy.ndarray` has, and
+       should not be used in array contexts. See the document on
+       :ref:`type-confusion` for more details.
+
     """
     dtype = intp
 

diff --git a/ndindex/tuple.py b/ndindex/tuple.py
@@ -31,6 +31,13 @@ class Tuple(NDIndex):
     >>> a[idx.raw]
     array([2, 3])
 
+    .. note::
+
+       `Tuple` does *not* represent a tuple, but rather an *tuple index*. It
+       does not have most methods that `tuple` has, and should not be used in
+       non-indexing contexts. See the document on :ref:`type-confusion` for
+       more details.
+
     """
     def _typecheck(self, *args):
         from .ellipsis import ellipsis