Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should NamedArray be interchangeable with other array types? or Should we support the axis kwarg? #8333

Closed
Tracked by #8238
dcherian opened this issue Oct 18, 2023 · 18 comments
Labels

Comments

@dcherian
Copy link
Contributor

What is your issue?

Raising @Illviljan's comment from #8304 (comment).

@dcherian dcherian added API design topic-NamedArray Lightweight version of Variable labels Oct 18, 2023
@max-sixty
Copy link
Collaborator

Please weigh this quite low give the amount of context I have — but I would vote "no". Arguably good APIs are limiting as well as permissive, and ideally users shouldn't rely on axis order.

(I would also vote to deprecate this from xarray)

That said, if there are real use-cases or compatibility concerns, I would override purity for practicality.

@dcherian
Copy link
Contributor Author

Perhaps the real question is "Should NamedArray be interchangeable with other array types?" If so, we'd want to support all the various array protocols and the axis kwarg.

@max-sixty
Copy link
Collaborator

Perhaps the real question is "Should NamedArray be interchangeable with other array types?" If so, we'd want to support all the various array protocols and the axis kwarg.

Yes, that's very fair.

Maybe the synthesis of these is NamedArray should support it, but xarray shouldn't...?

@dcherian dcherian changed the title Should we support the axis kwarg for NamedArray reductions? Should NamedArray be interchangeable with other array types? or Should we support the axis kwarg? Oct 19, 2023
@Illviljan
Copy link
Contributor

Perhaps the real question is "Should NamedArray be interchangeable with other array types?" If so, we'd want to support all the various array protocols and the axis kwarg.

I've been thinking that's what namedarray._array_api is for. But .reduce is not a part of the array api standard so we don't have to add axis to it.

@dcherian
Copy link
Contributor Author

But .reduce is not a part of the array api standard so we don't have to add axis to it.

Fair point.


Another thought: Like numpy, we could provide both an array api interface, and the Xarray type interface

So xp.mean(named_array, axis=1) or named_array.mean(dim="time") but not named_array.mean(axis=1) or xp.mean(named_array, dim="time"). That way you're explicitly choosing between unlabeled or labeled array semantics.

cc @shoyer

@TomNicholas
Copy link
Member

TomNicholas commented Oct 20, 2023

Perhaps the real question is "Should NamedArray be interchangeable with other array types?"

I might be being dense, but what is the purpose of supporting xp.mean(named_array, axis=1)? You're treating a NamedArray like a numpy.ndarray by using axis integers, but the only difference between NamedArray and a numpy.ndarray is named dimensions (and perhaps carting around metadata). Why even bother using xarray at that point?

EDIT: I guess here xarray at least would be able to act as general wrapper for other array types (including chunked arrays)... But even then, why bother? The whole point of the array API standard is to make array types interchangeable, so NamedArray calls without using named dimensions are still adding nothing over using "bare" array-API-compliant types.

@dcherian
Copy link
Contributor Author

dcherian commented Oct 20, 2023

I might be being dense,

Not at all! This is the core question. It's whether we think there's value in allowing users to seamlessly stick in NamedArray in their existing workflows. We can also punt and just stick to providing the current API (which I really like and prefer to the array API way, which seems to optimize for a library writer).

There's some precedence here in that we expect users to call np.abs(xarray_thing) which is massively confusing. "Why does a numpy function return an Xarray object?"

@TomNicholas
Copy link
Member

TomNicholas commented Oct 20, 2023

It's whether we think there's value in allowing users to seamlessly stick in NamedArray in their existing workflows

What would the user really gain by replacing numpy.ndarray with NamedArray in their code, if none of the method / function calls changed to indicate dimension name? Has anyone actually seen a worthwhile use case for this?

@TomNicholas
Copy link
Member

IMO the entire point of named dimensions has always been to remove a whole class of possible user errors: transposition errors. Supporting axis integers anywhere means code that uses xarray becomes vulnerable to such errors again, negating the benefit of using named dimensions.

@dcherian
Copy link
Contributor Author

Has anyone actually seen a worthwhile use case for this?

There's a lot of appetite for computational functions that handle anything including DataArrays.

Example: https://github.com/NCAR/geocat-comp/blob/7b0ac8fbfe77c7a399b9a11a55076d6d02651046/geocat/comp/gradient.py#L178C26-L181

@TomNicholas
Copy link
Member

There's a lot of appetite for computational functions that handle anything

Okay fair, now you say it I have seen that pattern also in MetPy. I have to say personally I've always viewed it as an antipattern though...

including DataArrays

Propagating a DataArray through xp.mean seems more useful than propagating a NamedArray through, because the DataArray actually carries coordinates/indexes with it. But that's not what we're proposing doing here.

@dopplershift do you want to state your case? 😁

So xp.mean(named_array, axis=1) or named_array.mean(dim="time") but not named_array.mean(axis=1) or xp.mean(named_array, dim="time"). That way you're explicitly choosing between unlabeled or labeled array semantics.

If we do choose to support axis integers then I prefer this solution, because it allows us to keep support for array-API-style functions a "hidden feature", without cluttering the xarray-method-chaining-style API.

@dcherian
Copy link
Contributor Author

dcherian commented Oct 20, 2023

I have to say personally I've always viewed it as an antipattern though

Yeah we should release apply_ufunc as a library haha ;)

EDIT: Or does it just work today with all numpy inputs?

@shoyer
Copy link
Member

shoyer commented Oct 21, 2023

I don't think NamedArray should be interchangeable with other arrays types. Xarray has different semantics than other array API objects, most notably boardcasting by dimension name rather than dimension order.

There is room for another array type with named dimensions that is more minimal and intentionally doesn't change any behavior from NumPy, but that array type isn't NamedArray.

@dcherian
Copy link
Contributor Author

Just recapping our discussion at the last meeting.

There's general agreement that NamedArray should not offer an array-API compatible interface.

However currently we still rely on scalar ufuncs like np.abs(NamedArray)->NamedArray. Consensus was that we should raise this upstream, to see if the array API folks think its a good idea to only adopt scalar ufunc dispatching, and not the rest of the array API. Alternatively we can offer our own ufunc machinery as xarray used to do a long time ago.

@andersy005 can you raise an issue upstream in the array API repo?

@andersy005
Copy link
Member

andersy005 commented Oct 31, 2023

Consensus was that we should raise this upstream, to see if the array API folks think its a good idea to only adopt scalar ufunc dispatching, and not the rest of the array API.

@dcherian, a glance at the Array-API documentation under the purpose and scope sections reveals that ufuncs and gufuncs are marked as out of scope:

  1. NumPy (generalized) universal functions, i.e. ufuncs and gufuncs.

Rationale: these are NumPy-specific concepts, and are mostly just a particular way of building regular functions with a few extra methods/properties.

this piece of information seems to address our question regarding the Array API's vision on permitting partial implementations, especially centered around scalar ufuncs, doesn’t it?

@dcherian
Copy link
Contributor Author

I guess the term is "elementwise funcs" https://data-apis.org/array-api/latest/API_specification/elementwise_functions.html

@andersy005
Copy link
Member

@andersy005 can you raise an issue upstream in the array API repo?

for those interested, here's the issue:

@TomNicholas
Copy link
Member

This can be closed can't it? Wasn't the conclusion that NamedArray should not try to obey the array API standard and should not support the axis kwarg?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

6 participants