-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix minimum/maximum over dimensions with missing values #35323
Conversation
I think we should aim for a better fix --- ideally, no element of the array should come into contact with an unnecessary value. E.g. when reducing along row 1 no values from other rows should be relevant. |
Well the result follows that rule. It's just that the implementation is simpler if you take the largest value of the array (resp. the smallest) to init all slices when computing the minimum (resp. maximum), but it's not visible to the user. Actually I have a better implementation which looks at the first non-missing value of each slice, but it takes about 50 lines (it's very similar to |
|
@JeffBezanson What's your preferred approach given my last comment? |
Bump. |
seems to be fixed by 76952a8 |
Not completely unfortunately. See the note there:
|
I agree with @JeffBezanson that we should have a better fix. I think we should not use typemin/typemax/initial values at all in the non-empty case. However, this is a bugfix so we should merge it to have fewer bugs until someone contributes a better fix. The tests this add already pass on master, so we should also add tests with Also needs a rebase. |
Types such as `BigInt` don't support `typemin` and `typemax` so the current method to find an initial value different from `missing` fails. Use the largest/smallest non-missing value to initialize the array instead. This is an inefficient approach. Faster alternatives would be to avoid using an initial value at all, and instead keep track of whether a value has been set in a separate mask; or to use `typemax`/`typemin` for types that support them.
I've rebased locally, but I'm not sure what's the best thing to do: should I keep the inefficient approach used here for all cases (this is slow only for one slice so not so slow for large arrays), or be smarter and try to use |
IMO it is best to use |
OK, I've opted for |
@LilithHafner Good to go now? |
I don't love the fallback initialization that ends up iterating over the first slice up to 3 times; or this whole system (that existed before this PR). It seems more complex than it needs to be and less performant than it could be. However, bugfix=merge. |
Yeah clearly this approach isn't great... But redesigning all this code isn't trivial. :-/ |
v0 != v0
returnsmissing
for missing values. Use the largest/smallest non-missing value to initialize the array. This is an inefficient approach. Faster alternatives would be to avoid using an initial value at all, and instead keep track of whether a value has been set in a separate mask; or to usetypemax
/typemin
for types that support them.Fixes #35308.