Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Aug 15, 2024

Rationale for this change

More than once I've been confused about how the SimdLevel template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters.

What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects.

This PR organizes the code to make this more explicit.

[1] #7871 (comment)

What changes are included in this PR?

  • Introduction of aggregate_basic-inl.h
  • Moving of the impls in aggregate_basic-inl.h to an anonymous namespace
  • Grouping of code based on the function they implement (Sum, Mean, and MinMax)

Are these changes tested?

By the compilation process, existing tests, and benchmarks.

@felipecrv felipecrv changed the title Avx GH-43719: [C++] Clarify the way aggregation kernels are generated from the same code in different compilation units Aug 15, 2024
@github-actions
Copy link

⚠️ GitHub issue #43719 has been automatically assigned in GitHub to PR creator.

@felipecrv felipecrv changed the title GH-43719: [C++] Clarify the way aggregation kernels are generated from the same code in different compilation units GH-43719: [C++] Clarify the way SIMD-enabled agg kernels come from the same code in different compilation units Aug 15, 2024
@apache apache deleted a comment from github-actions bot Aug 15, 2024
@felipecrv
Copy link
Contributor Author

@kou where would I change CMake code to skip the installation of -inl.h headers?

@felipecrv
Copy link
Contributor Author

I checked binary size impact: arrow-acero-aggregate-benchmark only grew by 120 bytes.

@kou
Copy link
Member

kou commented Aug 15, 2024

Could you add internal to file name like aggregate_basic_common_internal.h?
See also:

if(HEADER_BASENAME MATCHES "internal")
continue()
endif()

@felipecrv
Copy link
Contributor Author

felipecrv commented Aug 16, 2024

Could you add internal to file name like aggregate_basic_common_internal.h? See also:

if(HEADER_BASENAME MATCHES "internal")
continue()
endif()

I needprefer to end it with -inl.h to follow the pattern found in many C++ codebases [1]. I can rename it to aggregate_basic_internal-inl.h.

[1] https://github.com/facebook/folly/blob/main/folly/channels/ChannelProcessor-inl.h

@felipecrv
Copy link
Contributor Author

I thought the rule only matched on internal.h but having internal anywhere in the file name will do. Thanks.

@felipecrv
Copy link
Contributor Author

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-acero-aggregate-benchmark --benchmark-filter=MinMaxKernel* --iterations=3

@ursabot
Copy link

ursabot commented Aug 16, 2024

Benchmark runs are scheduled for commit ff39f8482bd772d70d337349fe0c9eb43566a337. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@mapleFU
Copy link
Member

mapleFU commented Aug 16, 2024

Thanks for the changes!

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit ff39f8482bd772d70d337349fe0c9eb43566a337.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@mapleFU
Copy link
Member

mapleFU commented Aug 16, 2024

Seems lint failed

@felipecrv
Copy link
Contributor Author

Seems lint failed

It's the -inl.h file prefix. I need to find a way of allowing - in this case and this case only in the linter.

@felipecrv felipecrv requested a review from pitrou August 16, 2024 15:18
@austin3dickey
Copy link
Contributor

Hey @felipecrv, I recently made changes to the @ursabot and forgot to update the help message, sorry! You don't need to add the --iterations=3 anymore (it will default to 6 repetitions now). That's why the benchmark builds were failing.

@felipecrv
Copy link
Contributor Author

@ursabot please benchmark command=cpp-micro --suite-filter=arrow-acero-aggregate-benchmark --benchmark-filter=MinMaxKernel*

@ursabot
Copy link

ursabot commented Aug 16, 2024

Benchmark runs are scheduled for commit a36a8d04f618b6ee1fe62e1756cea880e6996435. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 3 benchmarking runs that have been run so far on PR commit a36a8d04f618b6ee1fe62e1756cea880e6996435.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

@felipecrv felipecrv force-pushed the avx branch 2 times, most recently from eb64fde to 2c9a588 Compare August 16, 2024 18:51
@felipecrv
Copy link
Contributor Author

felipecrv commented Aug 16, 2024

@austin3dickey can I run something on my machine to test these benchmark filters? I'm confused about why they don't match any benchmark here.

EDIT: I misunderstood the message. There aren't baselines to compare, but the benchmarks ran.

@austin3dickey
Copy link
Contributor

Yeah @felipecrv, it looks like they ran. I just noticed we don't run this suite on the main branch commits right now, which is why there's no baseline. I'm not exactly sure why we don't; there might have been a segfault or something at some point in the past. I can try to kick off a manual run of this suite on the default branch for you and post a comparison here.

@austin3dickey
Copy link
Contributor

can I run something on my machine to test these benchmark filters?

Yes, you can install archery and do archery benchmark run --suite-filter arrow-acero-aggregate-benchmark --benchmark-filter MinMaxKernel*

@austin3dickey
Copy link
Contributor

Okay, I ran a similar run on the latest main commit for the three machine types we ran above. Here are the comparisons:

There isn't a z-score comparison because we don't have a distribution of baseline results to compare to; just the one. But you can look at the percent change column.

@felipecrv felipecrv requested a review from pitrou August 22, 2024 22:51
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 22, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @felipecrv . This looks good to me, thanks!

@pitrou
Copy link
Member

pitrou commented Sep 3, 2024

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Sep 3, 2024

Revision: 447fbc9

Submitted crossbow builds: ursacomputing/crossbow @ actions-7e561de588

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6ce2af7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

khwilson pushed a commit to khwilson/arrow that referenced this pull request Sep 14, 2024
…rom the same code in different compilation units (apache#43720)

### Rationale for this change

More than once I've been confused about how the `SimdLevel` template parameters on these kernel classes affect dispatching of kernels based on SIMD support detection at runtime [1] given that nothing in the code changes based on the parameters.

What matters is the compilation unit in which the templates are instantiated. Different compilation units get different compilation parameters. The SimdLevel parameters don't really affect the code that gets generated (!), they only serve as a way to avoid duplication of symbols in the compiled objects.

This PR organizes the code to make this more explicit.

[1] apache#7871 (comment)

### What changes are included in this PR?

 - Introduction of aggregate_basic-inl.h
 - Moving of the impls in `aggregate_basic-inl.h` to an anonymous namespace
 - Grouping of code based on the function they implement (`Sum`, `Mean`, and `MinMax`)

### Are these changes tested?

By the compilation process, existing tests, and benchmarks.

* GitHub Issue: apache#43719

Lead-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants