-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types #7871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I can trigger a benchmark action once #7870 get merged. Below is the BM data for int types on my setup: |
9141619 to
553266e
Compare
553266e to
4995295
Compare
4995295 to
1c56802
Compare
|
|
@ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax |
Below is the results for null_percent 0.01% and 0% on https://ci.ursalabs.org/#/builders/73/builds/101 |
|
@jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now. |
1c56802 to
c9e781d
Compare
No problem at all. Rebased now. Thanks. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. Some comments still.
|
Passing |
Below is the cmd I used, and compiler vectorise happens only on Int types. |
|
Ah, I also had |
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
This reverts commit 8b5b1a6aa491e76599c1988baf9c5df5a970e672.
Signed-off-by: Frank Du <frank.du@intel.com>
Signed-off-by: Frank Du <frank.du@intel.com>
9d84640 to
ab6f2be
Compare
|
Rebased. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Test failures are unrelated, will merge. |
I've came here after looking at the code and being confused. It sounds like there was never a need to instantiate templates with a I might be wrong, in that case, I would love to have a pointer to the specialized code. |
|
@felipecrv I may be misunderstanding your question, but the |
I understand that, but if no specialization exists for different SIMD levels, we don't need more than the |
The source code is usually the same for all variations, but the generated code (which matters for ODR) varies thanks to different compiler options. arrow/cpp/src/arrow/CMakeLists.txt Lines 353 to 359 in 8b634ad
We do: arrow/cpp/src/arrow/compute/function.cc Lines 133 to 150 in 8b634ad
|
OK. Now I get it. The compiler options are source file specific and not global to the entire build. |
|
Right :-) I agree it's a bit difficult to follow. |
…e same code in different compilation units (#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] #7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: #43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…rom the same code in different compilation units (apache#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] apache#7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: apache#43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…e same code in different compilation units (#43720) ### Rationale for this change More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters. What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects. This PR organizes the code to make this more explicit. [1] apache/arrow#7871 (comment) ### What changes are included in this PR? - Introduction of aggregate_basic-inl.h - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`) ### Are these changes tested? By the compilation process, existing tests, and benchmarks. * GitHub Issue: #43719 Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Uh oh!
There was an error while loading. Please reload this page.