-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10263: [C++][Compute] Improve variance kernel numerical stability #8437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improve variance merging method to address stabiliy issue when merging short chunks with approximate mean value. Improve reference variance calculation by leveraging Kahan summation.
|
CI failure looks not related |
| this->AssertVarStdIs("[100000004, 100000007, 100000013, 100000016]", options, 30.0); | ||
| this->AssertVarStdIs("[1000000004, 1000000007, 1000000013, 1000000016]", options, 30.0); | ||
|
|
||
| #ifndef __MINGW32__ // MinGW has precision issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only the 32-bit MinGW build, i.e. it was perhaps not MinGW but x87 (perhaps you can check with a 32-bit Linux build?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test failed on mingw 32 community CI. And I see similar comments in decimal unit test.
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/decimal_test.cc#L695
I didn't tested it on my side. Maybe I can start a 32bit VM to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've checked and there is no failure on Linux i386. It does seem MinGW-related.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge. Thanks a lot for doing this!
|
Are there any before/after benchmarks? It's really nice that we can have extra numerical stability, I'm just curious what's the penalty for it. |
This change is only for combing variances from multiple arrays. The time is trivial compared with computing variance for each array. |
|
Amazing, thanks! |
Improve variance merging method to address stability issue when merging short chunks with approximate mean value. Improve reference variance accuracy by leveraging Kahan summation. Closes #8437 from cyb70289/variance-stability Authored-by: Yibo Cai <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Improve variance merging method to address stability issue when merging
short chunks with approximate mean value.
Improve reference variance accuracy by leveraging Kahan summation.