-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Expose Variable without Pandas dependency #3981
Comments
FYI the conversation on sklearn is far from resolved, and at this point I think the added pandas dependency is not what will keep us from using xarray. I think right now we're most concerned about sparse data representations (and I was considering asking you folks if you'd support scipy.sparse ;) |
Thanks @amueller! You're not the only group that has voiced interest in an index-free labeled array so I think its still worth discussing.
We have recently added support for pydata/sparse. We need to document this better (#3484). This work is not 100% complete (#3213) and is part of Xarray's larger goal of supporting duck arrays. Please chime in on one of those issues so we can go a bit deeper. |
To add another use case, the NiBabel package has considered how to label axes (https://github.com/nipy/nibabel/wiki/BIAP6, nipy/nibabel#412), but it's fallen by the wayside. We considered xarray when it was still xray, but the pandas dependency has always been a sticking point. This is partially due to a desire to keep dependencies minimal, and partly due to the size of pandas causing significant overhead at import time for what is a relatively small component. If it's useful to go further into our use case, we can, but this is just to put in a vote for making pandas an optional dependency, if possible. |
Just to add - we at Nibabel are very interested in adding labelled arrays with Xarray, but for us, the Pandas dependency is a serious problem. We're a base library for reading brain imaging formats, and we sit at the bottom of several imaging stacks, so it is very important to us that we don't introduce heavy dependencies - because we pass these on to all the libraries that depend on us. We've looked enviously at Xarray for a while, but the Pandas dependency is a serious-enough problem that we've held off from using it. Just for example, the Pandas dependency of Xarray was the reason that I was working on Datarray, to see if we could use that instead (we couldn't). |
What would the options be for moving this forward?
|
Another possible outcome is that pandas becomes optional as part of the indexing refactor, and we expose Variable as public API. BUT @shoyer has been consistent against making Variable public, so maybe the best option is to fork it out to EDIT: The git log for |
I think it would be relatively straightforward to expose |
Thinking about this a little more, I would lean towards the separate package that xarray could depend upon. Perhaps we could all this |
I've heard folks referring to this proposal as |
We were just talking about this over at https://github.com/nipy/nibabel - because we are about commit ourselves to an array-axis-labelling API. Is |
We wrote a proposal to build an "xarray-lite" as part of a recent CZI
application. We're still waiting to hear back, but if that's funded we'd
definitely love to work with you on this.
…On Fri, Jul 9, 2021 at 12:16 PM Matthew Brett ***@***.***> wrote:
We were just talking about this over at https://github.com/nipy/nibabel -
because we are about commit ourselves to an array-axis-labelling API. Is
xarray-lite on the near or the distant horizon? Should we wait, to make
our decisions? Mentioning @effigies <https://github.com/effigies> because
he reminded me about this thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3981 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVRJCOTD7TEROEW5RHDTW5DHHANCNFSM4MLCDJQQ>
.
|
@shoyer - thanks for the feedback. I guess this means that it's unlikely this will be ready in time for our own CZI grant to finish (around June 2022)? |
hi all, just wanted to check in if there's been any further thoughts/ progress here. We're having more and more need for an |
Hi @sofroniewn - This is certainly something we still want to work on (see this section of our current roadmap and a more detailed proposal that included work in this area). I actually think the relevant part of the linked proposal is the best we have for a working plan here (text copied from doc below):
We have a bi-weekly developer call on Wednesday mornings (#4001), one idea would be devote 10-15 minutes of our next meeting to this topic. Is that something you and/or @andy-sweet would be up for joining? |
Sounds good to me. Looks like the next one is on December 7th at 13:30 UTC? If so, should I just jump on the Zoom meeting or do I need to add an item to the agenda? I'm still fairly new to xarray and have mostly poked around a little at the feasibility of using |
@andy-sweet - please do join the next call. I've added it to the meeting agenda. |
I can't make the calls next week - but maybe @andy-sweet can report back and we'll take it from there. Thanks!! |
This is something I am getting more and more interested in. We (scipp) currently have a C++ implementation (with Pything bindings) of a simpler version of While I am still far from having reached a conclusion (or convincing anyone here to support this), investing in technology that is adopted and carried by the community is considered important here. In other words, we may in principle be able to help out and invest some time into this. One important precondition would be full compatibility with other custom array containers: For our applications we do not just need to add labelled axes, but also units, masks, bin edges, and ragged data support. I am currently toying with the idea of a "stack" of Python array libraries (I guess you would call them duck arrays?) that add these features one by one, selectively, but can all be used also independently --- unlike Scipp, where you get all or nothing, and lose the ability of using NumPy (or other) array libraries under the hood. Each of those libraries could be small and simple, focussing one just one specific aspect, but everything should be composable. For example, we can imagine a |
I note that |
👋🏽 everyone, i wanted to share that we (@dcherian, @scottyhq, @maxrjones, and I ) are currently working on a design document for a new project called we believe that the design involves
you can find more details in the very rough draft: Xarray-lite design document. @pydata/xarray, we would greatly appreciate any feedback, suggestions, or contributions to the design document |
hey all, I'm curious to hear any update on this. Following as many threads as I could here the trail seems to run a little cold about 6 months ago. Is there still an active push towards the xarray-lite spec (even if those involved have been temporarily sidetracked)? |
This is in-progress but is moving a bit slowly at the moment. See #8238 |
thanks for the update! yeah I saw #8238 as well. good to know it's still in progress, even if slowly 👍 |
This issue proposes exposing Xarray's
Variable
class as a stand-alone array class with named axes (dims
) and arbitrary metadata (attrs
) but without coordinates (indexes
). Yes, this already exists but theVariable
class in currently inseparable from our Pandas dependency, despite not utilizing any of its functionality. What would this entail?The biggest change would be in making Pandas an optional dependency and isolating any imports. This change could be confined to the
Variable
object or could be propagated further as the Explicit Indexes work proceeds (#1603).Why?
Within Xarray, the
Variable
class is a vital building block for many of our internal data structures. Recently, the utility of a simple array with named dimensions has been highlighted by a few potential user communities:An example from the above linked SLEP as to why users may not want Pandas a dependency in Xarray:
Since we already have a class developed that meets these applications' use cases, its seems only prudent to evaluate the feasibility in exposing the
Variable
as a low-level api object.In conclusion, I'm not sure this is currently worth the effort but its probably worth exploring at this point.
The text was updated successfully, but these errors were encountered: