-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance with numpy_groupies
#222
Comments
Note that (2) is worse because we always accumulate |
About ml31415/numpy-groupies#3 I'm not categorically against adding multiple aggregations in one go. It's mainly, that so far I considered the setup overhead of As you mentioned |
In my benchmarks this was ~25-30% of the time for nd |
IMO our main bottleneck now is how
numpy_groupies
converts nD problems to a 1D problem before usingbincount
,ufunc.at
etc (ml31415/numpy-groupies#46). (e.g. grouping an nD array by a 1D arraytime.month
and reducing along 1Dtime
).I tried to fix this but it had to be reverted because it doesn't generalize foraxis != -1
.We could just use it in(see Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77)numpy-groupies
whenaxis == -1
and use the standard path for other cases. This would be good I think.flox
still has the problem that for reductions likemean
we compute 2 reductions for dask arrays:sum
andcount
. This means we incur the cost twice. To avoid thisnumpy-groupies
would have to support multiple reductions (which they don't want to); or we make the transformation to a 1D problem ourselves. This is annoying but doable.PS: We could totally avoid all this but building out
numbagg
's groupby which IIRC is stuck on implementing a properfill_value
that is not the identity element for reductions.cc @Illviljan @TomNicholas
The text was updated successfully, but these errors were encountered: