Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easier way to create arrays #6

Closed
keewis opened this issue Nov 20, 2024 · 13 comments
Closed

easier way to create arrays #6

keewis opened this issue Nov 20, 2024 · 13 comments

Comments

@keewis
Copy link
Collaborator

keewis commented Nov 20, 2024

While the idea of using a function to create a nested namespace is neat, it makes creating arrays harder:

import marray
import numpy as np

xp = marray.masked_array(np)

a = xp.asarray(np.arange(10))
a.mask[...] = np.arange(10) % 2 == 0

What I'd like to have (but don't quite know how easy it is to support, nor if it actually is a good idea) is something like this:

import marray
import numpy as np

a = marray.MaskedArray(data=np.arange(10), mask=np.arange(10) % 2 == 0)
xp = a.__array_namespace__()  # nested namespace

Alternatively, this could also work (since the Array API doesn't forbid adding non-standard things to the namespace):

xp = marray.masked_array(np)
a = xp.MaskedArray(data=np.arange(10), mask=np.arange(10) % 2 == 0)

However, I would imagine that this makes creating / composing arrays a bit harder. And since we don't subclass arrays classes anymore, maybe we don't even need the dynamic namespace (and thus the meta-programming)?

@mdhaber
Copy link
Owner

mdhaber commented Nov 20, 2024

Your second example should work. And asarray also accepts the mask. Have you tried

import marray
import numpy as np

xp = marray.masked_array(np)
a = xp.asarray(np.arange(10), mask=(np.arange(10) % 2 == 0))

?

Currently mask doesn't appear in the documentation, because it's just copied from NumPy, but that can come later.

@keewis
Copy link
Collaborator Author

keewis commented Nov 20, 2024

That's interesting, I totally missed that. I was under the (quite possibly mistaken?) assumption that the Array API didn't allow adding additional arguments, so I didn't even bother to check.

@mdhaber
Copy link
Owner

mdhaber commented Nov 20, 2024

I'm not certain whether the standard disallows that sort of thing, but NumPy 2.1 asarray has an order argument that is not in the standard. I'm sure there are many more examples.

@mdhaber
Copy link
Owner

mdhaber commented Dec 2, 2024

@keewis this is now:

import marray
import numpy as np
mxp = marray.get_namespace(np)
a = mxp.asarray(np.arange(10), mask=(np.arange(10) % 2 == 0))
# MArray(
#     array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#     array([ True, False,  True, False,  True, False,  True, False,  True,
#        False])
# )

which is no longer than the suggestions in #6 (comment).

marray does not currently expose anything other than get_namespace. It is possible to expose a class MArray that infers the namespace from the arguments:

# draft just to illustrate
# subclasses might not work correctly as-written
class MArray:
    def __new__(self, data, mask=None):
        xp = data.__array_namespace__()
        mxp = get_namespace(xp)
        return mxp.MArray(data=data, mask=mask)

But my impression from your second suggestion is that four lines is short enough. Is this issue resolved?

@keewis
Copy link
Collaborator Author

keewis commented Dec 7, 2024

I don't think we have to go as far as implementing __new__, I'd also be fine with a convenience function like (I won't insist on any names):

def create_marray(data, mask=None):
    if mask is None:
        # standard `array_namespace` function, similar to what is defined by `array_api_compat`
        xp = array_namespace(data)
    else:
        xp = array_namespace(data, mask)

    mxp = marray.get_namespace(xp)
    return mxp.MArray(data=data, mask=mask)

(As an aside, I don't love the name of get_namespace as true getter functions are rare in python, plus getters usually return a "private" attribute without doing anything else, whereas this is closer to something I would call create_namespace or construct_namespace. I don't feel strongly about this, though)

Edit: to make it clear, the requirement to create the namespace first feels inconvenient, which the only complaint I have, other than feeling somewhat uneasy about modifying the signature of asarray, which I had believed should not be modified

@mdhaber
Copy link
Owner

mdhaber commented Dec 7, 2024

to make it clear, the requirement to create the namespace first feels inconvenient

It is possible to get around this for any function that accepts an array as input, but it is not without downsides. Off the top of my head:

  • The documentation would be independent of the underlying array's documentation. Also, we would probably want to render our own documentation for each function, then, since we would be exposing all the functions from our package. (Right now I don't think we need to, since we actually don't expose them - we exist only get_namespace.)
  • Array creation functions will not return arrays of the desired type unless we add an argument.
  • We have to think a bit more about how to determine the desired array type, e.g. if functions accept multiple arrays and some might not actually be arrays. Probably we just call array_namespace (as defined in array-api-compat) on the right arguments, but right now we don't have to determine the desired array type from the inputs so it's a non-issue.
  • Moreover, it requires a fair amount of internal restructuring.

This is all quite a price to pay for users adding a single line in addition to the import. Maybe something to think about for a MArray 2.0, since it has almost no impact on user code? I'd rather get something working within the existing structure for now.


This comment is relevant:
data-apis/array-api#584 (comment)

@mdhaber
Copy link
Owner

mdhaber commented Dec 7, 2024

other than feeling somewhat uneasy about modifying the signature of asarray, which I had believed should not be modified

We do not currently follow the positional-only, keyword-only conventions perfectly, so I think there are bigger fish to fry w.r.t. to the functions accepting more arguments than they should (gh-16).

So that aside: with mask unspecified, we are (I think) exactly as array API compatible as the underlying library - none of which are themselves truly array API compatible. With some elements masked, the behavior of functions doesn't follow the array API; it is adapted for the presence of the mask. Why in that case does accepting a mask stand out as a problem?

See also https://data-apis.org/array-api/latest/purpose_and_scope.html#conformance. it states that all the required things must be present, but it indicates that additional features may be present. It does not specifically mention additional keyword only arguments. But it says we can document our level of incompatibility. I would suggest we just document this one additional keyword argument as a potential "incompatibility", if adding a keyword is considered incompatible (even though only introspection would be able to detect the incompatibility in a test).

Resolved by data-apis/array-api#869 / data-apis/array-api#870. Conforming implementations may add additional arguments to functions.

@lucascolley
Copy link
Collaborator

closing now that we can use asarray, right?

@mdhaber
Copy link
Owner

mdhaber commented Dec 28, 2024

I think so. I think the only alternative on the table was that we could expose MArray as a class, and then you could instantiate an MArray like:

import numpy as np
import marray
x = marray.MArray([1, 2, 3])

So in theory you could use its methods after three lines instead of four. But as soon as you need to get functions from the associated namespace, you'd need:

mxp = x.__array_namespace__()

adding an ugly fourth line. I don't think this is worth it, considering the alternative is also four line long, looks cleaner, and avoids having to make the MArray class public, which I'd like to avoid unless its considered necessary for stronger reasons.

@keewis
Copy link
Collaborator Author

keewis commented Dec 28, 2024

I was thinking that this pattern:

import marray

x = marray.MArray(np.array([1, 2, 3]))
xp = x.__array_namespace__()
xp.exp(x)

was what you'd always get with the array API, unless you use the namespace directly (or at least, that was my impression when working on xarray).

Under that assumption the closest to ideal would be this:

import marray as ma

x = ma.MArray(np.array([1, 2, 3]))
ma.exp(x)

# equivalent to
xp = x.__array_namespace__()
xp.exp(x)

but I guess since adding additional args / kwargs is explicitly allowed, the current state is fine.

I guess now the only issues I have is the registering of mxp in sys.modules (since calling get_namespace multiple times would overwrite the previously registered module), and that we don't have a way of doing isinstance checks for MArray if the class is not public. The latter can easily be resolved by a class that implements __instancecheck__, so nothing major (if you prefer I can open new issues for those, though)

@mdhaber
Copy link
Owner

mdhaber commented Dec 28, 2024

if you prefer I can open new issues for those, though

Pull requests would be even better! But yes, please.

strict doesn't expose Array, right?
And asarray is the canonical way of creating arrays to begin with (in the Array API), no?

@lucascolley
Copy link
Collaborator

lucascolley commented Dec 28, 2024

Under that assumption the closest to ideal would be this:

import marray as ma

x = ma.MArray(np.array([1, 2, 3]))
ma.exp(x)

I think I disagree - this goes back to our discussion at data-apis/array-api#843 (comment). Here's my reasoning:

marray in itself cannot be a standard-compatible namespace. While it is clear what ma.exp(x) should do, since x has an underlying array namespace, it is not clear what ma.asarray([1]) should do, unless we introduce the concept of a default array backend. And like xarray (data-apis/array-api#698 (comment)), I think it would be confusing to provide a namespace which has lots of functions which are in the standard, but misses some like asarray.

For xarray, a dispatching module like xarray.ufuncs is needed, since NamedArrays are not compatible with the standard. But for libraries like marray which are compatible with the standard, and hence can provide __array_namespace__, I think using xp = x.__array_namespace__() is the idiomatic way to interface.


Something that arises from this: in data-apis/array-api#843 (comment) I discuss the idea of a pint(dask(marray(sparse))) array. I think __array_namespace__().__name__ should be based on the name of the namespace of the array backend. So something like marray(numpy) rather than just mxp. Perhaps there should be some further discussion of this on the standard-side though - I'm not sure how this should interplay with things like is_numpy_namespace from array-api-compat.

@lucascolley
Copy link
Collaborator

lucascolley commented Dec 28, 2024

To be clear about the problem I see with the current __array_namespace__().__name__:

In [1]: import numpy as np

In [2]: import array_api_strict as xp

In [3]: import marray

In [4]: mnp = marray.get_namespace(np)

In [5]: mxp = marray.get_namespace(xp)

In [6]: xp1 = mnp.arange(2).__array_namespace__()

In [7]: xp1.__name__
Out[7]: 'mxp'

In [8]: xp1.arange(2).data.__array_namespace__().__name__
Out[8]: 'numpy'

In [9]: xp2 = mxp.arange(3).__array_namespace__()

In [10]: xp2.__name__
Out[10]: 'mxp'

In [11]: xp2.arange(3).data.__array_namespace__().__name__
Out[11]: 'array_api_strict'

The modules behave differently since they have different array backends, but have the same name. I guess the same goes for __repr__.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants