Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 19, 2025

Note to self:

  • Apply the same thing to buffer_bit_ops benchmarks

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

  • Closes #NNN.

Rationale for this change

The boolean and bitwise kernels are currently very noisy and sometimes get nonsensical results (e.g. #8854 (comment)) I think this is because they are so fast. For example, Here is an output from a recent run (note the benchmark each takes only nanoseconds

group         alamb_bitwise_ops                      main
-----         -----------------                      ----
and           1.00    272.4±1.45ns        ? ?/sec    1.00    273.1±1.36ns        ? ?/sec
and_sliced    1.00   1096.0±1.60ns        ? ?/sec    1.00   1095.1±2.77ns        ? ?/sec
not           1.00    213.8±0.29ns        ? ?/sec    1.00    214.0±0.40ns        ? ?/sec
not_sliced    1.00    965.6±9.77ns        ? ?/sec    1.00    961.8±5.75ns        ? ?/sec
or            1.00    254.1±0.66ns        ? ?/sec    1.01    255.6±0.41ns        ? ?/sec
or_sliced     1.00   1225.5±2.12ns        ? ?/sec    1.00   1226.9±7.43ns        ? ?/sec

What changes are included in this PR?

  1. Change the array size to 8192 to better match typical sizes used
  2. Run each kernel 100 times per criterion iteration to reduce noise

Are these changes tested?

I will benchmark then

Are there any user-facing changes?

No these are benchmarks

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 19, 2025
fn bitwise_array_benchmark(c: &mut Criterion) {
let size = 64 * 1024_usize;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We often run these kernels on 8k elements, not 64k elements, so adjust the benchmark to reflect that

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/benchmarks (471d991) to 08dcc0b diff
BENCH_NAME=bitwise_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench bitwise_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_benchmarks
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/benchmarks (471d991) to 08dcc0b diff
BENCH_NAME=boolean_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_benchmarks
Results will be posted here when complete

@apache apache deleted a comment from alamb-ghbot Dec 12, 2025
@alamb
Copy link
Contributor Author

alamb commented Dec 12, 2025

run benchmark bitwise_kernel

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group         alamb_benchmarks                       main
-----         ----------------                       ----
and           32.82     9.0±0.10µs        ? ?/sec    1.00    275.4±3.66ns        ? ?/sec
and_sliced    28.02    34.5±0.34µs        ? ?/sec    1.00  1232.3±23.71ns        ? ?/sec
not           39.97     8.7±0.08µs        ? ?/sec    1.00    216.7±1.65ns        ? ?/sec
not_sliced    30.35    21.3±0.21µs        ? ?/sec    1.00    703.0±7.25ns        ? ?/sec
or            36.19     9.0±0.12µs        ? ?/sec    1.00    248.6±0.90ns        ? ?/sec
or_sliced     28.69    31.4±0.54µs        ? ?/sec    1.00  1093.3±10.24ns        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/benchmarks (471d991) to 08dcc0b diff
BENCH_NAME=bitwise_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench bitwise_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_benchmarks
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                  alamb_benchmarks                       main
-----                                                                  ----------------                       ----
bench bitwise array scalar: and/bitwise array and, 20% nulls           10.82   185.5±1.07µs        ? ?/sec    1.00     17.1±0.83µs        ? ?/sec
bench bitwise array scalar: and/bitwise array scalar and, no nulls     13.73   225.4±2.66µs        ? ?/sec    1.00     16.4±0.35µs        ? ?/sec
bench bitwise array scalar: or/bitwise array scalar or, 20% nulls      13.76   229.1±2.50µs        ? ?/sec    1.00     16.6±0.66µs        ? ?/sec
bench bitwise array scalar: or/bitwise array scalar or, no nulls       11.10   181.4±0.48µs        ? ?/sec    1.00     16.3±0.38µs        ? ?/sec
bench bitwise array scalar: xor/bitwise array scalar xor, 20% nulls    11.06   185.4±2.32µs        ? ?/sec    1.00     16.8±0.60µs        ? ?/sec
bench bitwise array scalar: xor/bitwise array scalar xor, no nulls     11.03   181.4±0.55µs        ? ?/sec    1.00     16.4±0.39µs        ? ?/sec
bench bitwise array: and/bitwise array and, 20% nulls                  9.28    353.9±2.12µs        ? ?/sec    1.00     38.1±4.84µs        ? ?/sec
bench bitwise array: and/bitwise array and, no nulls                   9.75    374.4±4.04µs        ? ?/sec    1.00     38.4±4.06µs        ? ?/sec
bench bitwise: not/bitwise array not, 20% nulls                        10.17   182.9±1.53µs        ? ?/sec    1.00     18.0±0.54µs        ? ?/sec
bench bitwise: not/bitwise array not, no nulls                         9.55    189.1±1.82µs        ? ?/sec    1.00     19.8±0.46µs        ? ?/sec
bench bitwise: or/bitwise array or, 20% nulls                          9.23    348.0±5.97µs        ? ?/sec    1.00     37.7±4.61µs        ? ?/sec
bench bitwise: or/bitwise array or, no nulls                           10.14   368.8±2.57µs        ? ?/sec    1.00     36.4±4.36µs        ? ?/sec
bench bitwise: xor/bitwise array xor, 20% nulls                        9.40    348.8±3.00µs        ? ?/sec    1.00     37.1±4.52µs        ? ?/sec
bench bitwise: xor/bitwise array xor, no nulls                         9.67    369.0±2.87µs        ? ?/sec    1.00     38.2±5.49µs        ? ?/sec

@alamb alamb marked this pull request as ready for review December 12, 2025 16:06
@alamb
Copy link
Contributor Author

alamb commented Dec 12, 2025

Roughly speaking this shows what I expect which is the benchmarks taking longer

@alamb alamb requested a review from Dandandan December 12, 2025 21:19
@alamb alamb changed the title Run boolean and bitwise kernels for longer to reduce noise Run boolean and bitwise kernels for longer to reduce benchmark noise Dec 12, 2025
@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2025

I am not convinced this is the right approach. I think a better approach would be to run the kernels on multiple different sizes / allocations. I suspect alignment is causing benchmark artifacts

@alamb alamb closed this Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants