Skip to content

perf: improve performance of encoding GenericByteArray by 8%#9054

Merged
alamb merged 3 commits intoapache:mainfrom
rluvaton:improve-row-conversion-for-generic-byte-array
Dec 30, 2025
Merged

perf: improve performance of encoding GenericByteArray by 8%#9054
alamb merged 3 commits intoapache:mainfrom
rluvaton:improve-row-conversion-for-generic-byte-array

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

Make row conversion faster

What changes are included in this PR?

created "manual" iterator over the byte array and offsets with optimizations for no nulls

Are these changes tested?

Existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 28, 2025
@rluvaton rluvaton changed the title perf: improve performance of encoding GenericByteArray by 15%-20% perf: improve performance of encoding GenericByteArray by 8% Dec 28, 2025
@rluvaton
Copy link
Member Author

Waiting for to run row format benchmark

@alamb
Copy link
Contributor

alamb commented Dec 29, 2025

run benchmark row_format

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-row-conversion-for-generic-byte-array (06e180b) to 814ee42 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-row-conversion-for-generic-byte-array
Results will be posted here when complete

.zip(null_buffer.iter())
.map(|(start_end, is_valid)| {
if is_valid {
Some(&bytes[start_end[0].as_usize()..start_end[1].as_usize()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might also be worth trying bytes.get_unchecked(...) here to skip the bounds checks if it helps

The input array has been validated so it is safe to assume the offsets are in range

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @rluvaton

Let's wait for the performance results, but I think the idea and code looks great to me

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                                         improve-row-conversion-for-generic-byte-array    main
-----                                                                                                                         ---------------------------------------------    ----
append_rows 10 large_list(0) of u64(0)                                                                                        1.04    649.4±6.83ns        ? ?/sec              1.00    623.9±5.06ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.02    682.7±7.21ns        ? ?/sec              1.00    669.1±4.74ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.01   379.1±12.52µs        ? ?/sec              1.00    375.1±5.93µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.41     12.1±0.14µs        ? ?/sec              1.00      8.6±0.18µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     14.7±0.15µs        ? ?/sec              1.15     17.0±0.16µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.7±0.13µs        ? ?/sec              1.01      7.8±0.13µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.13     15.9±0.40µs        ? ?/sec              1.00     14.1±0.08µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.08    173.7±2.16µs        ? ?/sec              1.00    160.6±1.80µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         1.03   945.4±32.57ns        ? ?/sec              1.00   920.4±13.54ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.08    165.5±1.96µs        ? ?/sec              1.00    153.0±0.82µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               1.07   1050.3±7.80ns        ? ?/sec              1.00   986.1±25.45ns        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.00    113.3±1.24µs        ? ?/sec              1.05    118.7±0.51µs        ? ?/sec
append_rows 4096 string view(1..100, 0.5)                                                                                     1.00    101.8±0.94µs        ? ?/sec              1.05    107.1±1.20µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.11     51.4±1.11µs        ? ?/sec              1.00     46.4±1.27µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.00     75.3±0.68µs        ? ?/sec              1.03     77.6±1.09µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     85.0±1.44µs        ? ?/sec              1.03     87.9±0.75µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.06     53.6±0.30µs        ? ?/sec              1.00     50.5±0.17µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.04     47.8±0.52µs        ? ?/sec              1.00     46.0±0.15µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.00     70.5±0.83µs        ? ?/sec              1.06     74.7±0.92µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     75.6±0.30µs        ? ?/sec              1.19     89.8±0.35µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.00    221.7±2.28µs        ? ?/sec              1.07    237.6±6.49µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.01     49.5±0.49µs        ? ?/sec              1.00     48.8±0.67µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.05     76.6±0.80µs        ? ?/sec              1.00     73.1±1.03µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.00    144.7±2.67µs        ? ?/sec              1.02    147.0±2.26µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.01    110.7±0.66µs        ? ?/sec              1.00    109.4±0.85µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.02     77.9±0.87µs        ? ?/sec              1.00     76.1±0.71µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.03     28.1±0.20µs        ? ?/sec              1.00     27.2±0.15µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.01     46.2±0.51µs        ? ?/sec              1.00     45.6±0.36µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.03     28.1±0.33µs        ? ?/sec              1.00     27.2±0.27µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.7±0.19µs        ? ?/sec              1.00      7.7±0.11µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     13.8±0.18µs        ? ?/sec              1.08     14.9±0.29µs        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.02   917.6±18.01ns        ? ?/sec              1.00    897.9±5.18ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.02   962.8±31.91ns        ? ?/sec              1.00   947.6±55.30ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.00    375.2±2.28µs        ? ?/sec              1.01    378.8±2.31µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.38     12.4±0.19µs        ? ?/sec              1.00      9.0±0.28µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     15.0±0.13µs        ? ?/sec              1.15     17.3±0.67µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.00      7.9±0.13µs        ? ?/sec              1.01      8.0±0.19µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.14     16.3±0.74µs        ? ?/sec              1.00     14.3±0.09µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.08    173.8±0.91µs        ? ?/sec              1.00    161.7±1.99µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     1.01  1210.9±30.64ns        ? ?/sec              1.00  1202.4±11.83ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.09    167.1±6.72µs        ? ?/sec              1.00    153.8±1.10µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           1.04  1335.8±57.21ns        ? ?/sec              1.00  1290.3±26.69ns        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    113.8±0.92µs        ? ?/sec              1.05    119.2±0.70µs        ? ?/sec
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    103.5±1.73µs        ? ?/sec              1.04    107.1±0.82µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.14     52.6±1.15µs        ? ?/sec              1.00     46.1±0.89µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.00     76.0±0.72µs        ? ?/sec              1.04     78.9±0.90µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     86.2±1.88µs        ? ?/sec              1.03     88.4±0.30µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.08     54.9±0.53µs        ? ?/sec              1.00     50.8±0.25µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.06     49.1±1.14µs        ? ?/sec              1.00     46.4±0.37µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     70.4±0.88µs        ? ?/sec              1.08     75.8±1.52µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     76.0±0.36µs        ? ?/sec              1.19     90.2±0.93µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.00    221.8±1.05µs        ? ?/sec              1.06    235.9±1.67µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.02     49.8±0.35µs        ? ?/sec              1.00     49.0±0.57µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.06     78.8±0.38µs        ? ?/sec              1.00     74.2±0.64µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.00    146.6±2.68µs        ? ?/sec              1.02    149.1±1.66µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.02    112.6±4.17µs        ? ?/sec              1.00    110.5±1.34µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.03     79.3±2.39µs        ? ?/sec              1.00     77.2±3.52µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.04     29.2±0.31µs        ? ?/sec              1.00     28.1±0.96µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.03     47.7±0.46µs        ? ?/sec              1.00     46.5±0.40µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.04     29.2±0.33µs        ? ?/sec              1.00     27.9±0.14µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.00      7.8±0.12µs        ? ?/sec              1.01      7.9±0.11µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     14.0±0.15µs        ? ?/sec              1.08     15.1±0.34µs        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.02    699.5±7.65ns        ? ?/sec              1.00    684.8±5.16ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.02    741.6±4.41ns        ? ?/sec              1.00    728.0±4.27ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    371.4±4.60µs        ? ?/sec              1.01    375.6±4.12µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.41     12.3±0.11µs        ? ?/sec              1.00      8.7±0.19µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     15.0±0.36µs        ? ?/sec              1.15     17.2±0.17µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      7.7±0.05µs        ? ?/sec              1.03      7.9±0.09µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.12     16.0±0.17µs        ? ?/sec              1.00     14.2±0.10µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.08    173.7±1.37µs        ? ?/sec              1.00    161.0±0.75µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            1.01   1007.0±4.14ns        ? ?/sec              1.00   1000.5±5.11ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.08    166.1±2.50µs        ? ?/sec              1.00    153.6±2.19µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  1.02   1105.1±5.47ns        ? ?/sec              1.00   1085.7±4.66ns        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    113.6±1.13µs        ? ?/sec              1.05    119.0±0.52µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    103.6±2.08µs        ? ?/sec              1.03    106.9±0.48µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.13     51.9±0.92µs        ? ?/sec              1.00     45.8±1.10µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.00     75.6±1.16µs        ? ?/sec              1.05     79.0±3.29µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     85.8±0.40µs        ? ?/sec              1.03     88.0±0.38µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.09     55.3±1.07µs        ? ?/sec              1.00     50.7±1.13µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.05     48.6±0.80µs        ? ?/sec              1.00     46.1±0.10µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     70.8±1.46µs        ? ?/sec              1.06     75.2±0.39µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.00     76.1±0.81µs        ? ?/sec              1.18     90.0±0.31µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    223.0±2.71µs        ? ?/sec              1.07    238.4±3.93µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.02     49.8±0.20µs        ? ?/sec              1.00     48.7±0.16µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.05     77.1±0.87µs        ? ?/sec              1.00     73.8±0.52µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    144.5±1.99µs        ? ?/sec              1.02    147.7±1.44µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    110.6±1.79µs        ? ?/sec              1.00    110.3±1.12µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.01     77.9±0.26µs        ? ?/sec              1.00     76.9±0.65µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.04     28.3±0.24µs        ? ?/sec              1.00     27.3±0.12µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.03     46.6±0.40µs        ? ?/sec              1.00     45.4±0.72µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.04     28.3±0.11µs        ? ?/sec              1.00     27.2±0.47µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.8±0.10µs        ? ?/sec              1.01      7.8±0.08µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     13.9±0.29µs        ? ?/sec              1.08     15.0±0.15µs        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.08   1651.8±6.67ns        ? ?/sec              1.00  1530.7±11.29ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.06   1732.7±6.24ns        ? ?/sec              1.00  1632.0±25.68ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    305.8±2.78µs        ? ?/sec              1.04    319.3±6.35µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.05     17.3±0.33µs        ? ?/sec              1.00     16.4±0.06µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.05     17.2±0.27µs        ? ?/sec              1.00     16.4±0.05µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     33.2±0.47µs        ? ?/sec              1.06     35.2±1.98µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.00     33.2±0.26µs        ? ?/sec              1.05     34.8±0.28µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.02    275.8±1.03µs        ? ?/sec              1.00    270.1±2.49µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.00  1998.9±13.15ns        ? ?/sec              1.05      2.1±0.02µs        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.02    277.3±2.26µs        ? ?/sec              1.00    271.7±5.41µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.00      2.2±0.01µs        ? ?/sec              1.01      2.2±0.03µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.02    171.9±1.43µs        ? ?/sec              1.00    168.4±0.62µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.01    137.2±0.72µs        ? ?/sec              1.00    135.9±0.47µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.06     80.8±0.41µs        ? ?/sec              1.00     76.3±1.53µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.03    126.8±3.13µs        ? ?/sec              1.00    123.6±0.77µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.02    115.6±1.03µs        ? ?/sec              1.00    113.1±2.44µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.05     91.0±1.69µs        ? ?/sec              1.00     86.3±0.83µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     63.6±0.35µs        ? ?/sec              1.03     65.8±2.30µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.00    112.5±3.04µs        ? ?/sec              1.02    114.8±0.96µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    106.9±1.85µs        ? ?/sec              1.03    110.3±0.93µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    303.0±1.82µs        ? ?/sec              1.05    318.0±2.91µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.00     75.8±0.90µs        ? ?/sec              1.04     78.9±0.87µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.00     63.7±0.21µs        ? ?/sec              1.04     66.0±0.59µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    112.5±1.08µs        ? ?/sec              1.03    115.6±1.19µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    106.8±0.77µs        ? ?/sec              1.04    110.9±1.88µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.00     75.9±0.69µs        ? ?/sec              1.05     79.4±1.43µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     63.7±0.45µs        ? ?/sec              1.04     66.1±0.38µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    112.0±1.44µs        ? ?/sec              1.03    115.6±0.93µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.00     75.8±0.88µs        ? ?/sec              1.04     79.2±0.42µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     32.0±0.31µs        ? ?/sec              1.02     32.7±1.33µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.00     32.2±1.13µs        ? ?/sec              1.02     32.7±0.59µs        ? ?/sec
iterate rows                                                                                                                  1.00      3.3±0.02µs        ? ?/sec              1.00      3.3±0.04µs        ? ?/sec

… chance for skipping the encode null for non nullable
@rluvaton
Copy link
Member Author

run benchmark row_format

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-row-conversion-for-generic-byte-array (7e943df) to 814ee42 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-row-conversion-for-generic-byte-array
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                                         improve-row-conversion-for-generic-byte-array    main
-----                                                                                                                         ---------------------------------------------    ----
append_rows 10 large_list(0) of u64(0)                                                                                        1.00    623.7±9.58ns        ? ?/sec              1.01    631.0±7.44ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.02    678.7±4.48ns        ? ?/sec              1.00    663.3±2.66ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.01    373.1±4.41µs        ? ?/sec              1.00    370.9±2.47µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.41     12.1±0.20µs        ? ?/sec              1.00      8.6±0.04µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     14.8±0.34µs        ? ?/sec              1.15     17.1±0.37µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.01      7.7±0.23µs        ? ?/sec              1.00      7.6±0.07µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.05     14.9±0.15µs        ? ?/sec              1.00     14.2±0.08µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.01    162.2±1.14µs        ? ?/sec              1.00    161.0±1.56µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         1.02   912.8±14.58ns        ? ?/sec              1.00    896.3±3.49ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.08    165.4±1.01µs        ? ?/sec              1.00    153.2±1.14µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               1.05   1023.9±5.39ns        ? ?/sec              1.00    978.1±7.21ns        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.00    113.9±0.50µs        ? ?/sec              1.05    119.2±1.64µs        ? ?/sec
append_rows 4096 string view(1..100, 0.5)                                                                                     1.00    104.4±1.33µs        ? ?/sec              1.02    106.7±2.36µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.09     49.4±0.35µs        ? ?/sec              1.00     45.5±0.18µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.00     76.4±1.09µs        ? ?/sec              1.02     77.7±0.66µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     83.9±0.47µs        ? ?/sec              1.05     88.1±0.84µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.04     52.5±1.02µs        ? ?/sec              1.00     50.7±0.83µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.04     48.1±0.42µs        ? ?/sec              1.00     46.1±0.76µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.00     71.1±0.71µs        ? ?/sec              1.07     75.8±3.08µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     75.5±0.59µs        ? ?/sec              1.19     89.8±0.32µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.00    223.2±3.74µs        ? ?/sec              1.05    233.6±2.56µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.02     49.5±1.10µs        ? ?/sec              1.00     48.6±0.35µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.05     77.0±1.54µs        ? ?/sec              1.00     73.0±0.41µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.00    144.6±3.73µs        ? ?/sec              1.02    147.0±1.40µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.01    110.2±0.59µs        ? ?/sec              1.00    109.2±0.67µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.02     77.8±1.00µs        ? ?/sec              1.00     75.9±0.33µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.04     28.1±0.44µs        ? ?/sec              1.00     27.0±0.10µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.01     46.5±0.46µs        ? ?/sec              1.00     46.0±0.37µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.04     27.9±0.10µs        ? ?/sec              1.00     26.9±0.06µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.6±0.12µs        ? ?/sec              1.00      7.6±0.12µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     13.8±0.13µs        ? ?/sec              1.08     14.9±0.13µs        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.00    892.8±9.27ns        ? ?/sec              1.01   904.1±14.38ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.01    941.1±4.23ns        ? ?/sec              1.00    935.5±4.47ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.00    375.3±3.94µs        ? ?/sec              1.00    376.2±7.03µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.39     12.4±0.27µs        ? ?/sec              1.00      8.9±0.07µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     14.9±0.11µs        ? ?/sec              1.16     17.3±0.26µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.00      8.0±0.14µs        ? ?/sec              1.00      8.0±0.19µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.06     15.2±0.16µs        ? ?/sec              1.00     14.4±0.13µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.02   164.5±12.57µs        ? ?/sec              1.00    161.6±3.83µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     1.00   1189.2±4.94ns        ? ?/sec              1.00   1187.0±6.49ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.08    165.9±1.83µs        ? ?/sec              1.00    153.4±0.70µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           1.01  1295.8±16.96ns        ? ?/sec              1.00  1286.3±19.34ns        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    114.7±0.78µs        ? ?/sec              1.04    119.3±1.67µs        ? ?/sec
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    104.8±2.14µs        ? ?/sec              1.02    106.9±0.62µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.09     50.1±1.04µs        ? ?/sec              1.00     45.9±0.44µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.00     76.7±2.05µs        ? ?/sec              1.03     79.2±2.61µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     84.8±1.67µs        ? ?/sec              1.04     88.2±0.76µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.04     52.8±0.15µs        ? ?/sec              1.00     50.8±0.55µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.06     49.0±0.28µs        ? ?/sec              1.00     46.3±0.12µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     70.9±0.99µs        ? ?/sec              1.07     75.6±1.45µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     76.2±0.74µs        ? ?/sec              1.19     90.3±0.92µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.00    223.2±3.65µs        ? ?/sec              1.05    234.9±2.80µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.02     50.2±2.37µs        ? ?/sec              1.00     49.0±0.48µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.06     78.2±1.07µs        ? ?/sec              1.00     74.0±0.59µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.00    145.8±1.00µs        ? ?/sec              1.02    148.5±1.47µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    111.3±2.22µs        ? ?/sec              1.00    111.0±1.40µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.02     78.4±0.49µs        ? ?/sec              1.00     76.6±0.96µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.03     29.2±0.75µs        ? ?/sec              1.00     28.3±0.41µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.04     48.1±0.72µs        ? ?/sec              1.00     46.3±0.34µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.05     29.2±0.77µs        ? ?/sec              1.00     27.8±0.34µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.00      7.9±0.16µs        ? ?/sec              1.01      8.0±0.05µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     14.0±0.11µs        ? ?/sec              1.08     15.1±0.13µs        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.00    683.4±3.69ns        ? ?/sec              1.01    692.6±9.30ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.01    736.5±8.65ns        ? ?/sec              1.00    727.0±5.06ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    374.2±3.99µs        ? ?/sec              1.00    375.4±6.79µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.41     12.3±0.16µs        ? ?/sec              1.00      8.7±0.14µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     14.9±0.46µs        ? ?/sec              1.15     17.1±0.20µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      7.8±0.14µs        ? ?/sec              1.01      7.9±0.33µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.06     15.1±0.12µs        ? ?/sec              1.00     14.3±0.11µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.01    162.1±0.75µs        ? ?/sec              1.00    160.9±0.73µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            1.01    997.1±8.67ns        ? ?/sec              1.00    990.5±9.47ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.08    165.6±1.34µs        ? ?/sec              1.00    153.1±1.79µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  1.03   1121.3±5.01ns        ? ?/sec              1.00  1086.1±29.40ns        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    114.5±1.94µs        ? ?/sec              1.04    118.9±0.33µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    104.4±0.45µs        ? ?/sec              1.03    107.1±1.37µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.10     50.3±3.67µs        ? ?/sec              1.00     45.9±0.30µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.00     76.4±0.88µs        ? ?/sec              1.02     77.9±0.51µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     84.3±1.50µs        ? ?/sec              1.05     88.2±0.98µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.04     52.7±0.65µs        ? ?/sec              1.00     50.7±0.59µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.05     48.4±0.45µs        ? ?/sec              1.00     46.2±0.17µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     71.3±0.64µs        ? ?/sec              1.06     75.9±0.75µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.00     76.0±0.92µs        ? ?/sec              1.19     90.1±0.48µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    224.3±1.77µs        ? ?/sec              1.04    233.2±1.80µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.02     49.8±0.51µs        ? ?/sec              1.00     48.8±0.60µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.05     77.2±2.23µs        ? ?/sec              1.00     73.7±0.49µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    143.0±1.34µs        ? ?/sec              1.03    147.6±2.68µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.01    110.9±1.65µs        ? ?/sec              1.00    109.6±0.51µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.01     77.9±0.68µs        ? ?/sec              1.00     77.0±1.32µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.04     28.2±0.15µs        ? ?/sec              1.00     27.1±0.06µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.03     47.1±0.52µs        ? ?/sec              1.00     45.5±0.17µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.04     28.1±0.44µs        ? ?/sec              1.00     27.1±0.36µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.7±0.11µs        ? ?/sec              1.00      7.7±0.12µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     13.9±0.26µs        ? ?/sec              1.07     15.0±0.08µs        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.08  1658.2±27.21ns        ? ?/sec              1.00   1531.7±8.98ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.07  1731.5±26.28ns        ? ?/sec              1.00   1617.2±8.24ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    300.2±2.46µs        ? ?/sec              1.06    316.8±8.37µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.06     17.3±0.57µs        ? ?/sec              1.00     16.4±0.06µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.04     17.2±0.17µs        ? ?/sec              1.00     16.5±0.40µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     33.2±0.24µs        ? ?/sec              1.05     34.8±0.50µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.00     33.2±0.28µs        ? ?/sec              1.05     34.8±0.82µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.02    276.1±2.39µs        ? ?/sec              1.00    270.4±5.92µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.00  1981.0±34.75ns        ? ?/sec              1.06      2.1±0.02µs        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.03   277.7±19.55µs        ? ?/sec              1.00    270.9±1.77µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.00      2.2±0.04µs        ? ?/sec              1.01      2.2±0.04µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.05    177.3±4.03µs        ? ?/sec              1.00    168.4±2.26µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.04    141.8±1.18µs        ? ?/sec              1.00    135.9±0.73µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.12     84.7±0.89µs        ? ?/sec              1.00     75.9±1.13µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.06    129.6±2.01µs        ? ?/sec              1.00    122.8±1.57µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.05    119.1±0.93µs        ? ?/sec              1.00    112.9±1.67µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.11     95.9±0.95µs        ? ?/sec              1.00     86.1±0.82µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     60.9±1.15µs        ? ?/sec              1.08     65.6±0.26µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.00    111.8±7.37µs        ? ?/sec              1.03    115.1±3.71µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    104.3±0.89µs        ? ?/sec              1.05    109.8±0.71µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    304.1±4.05µs        ? ?/sec              1.04    317.0±5.10µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.00     73.0±0.71µs        ? ?/sec              1.08     79.2±2.63µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.00     60.7±0.28µs        ? ?/sec              1.11     67.1±7.84µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    110.5±0.99µs        ? ?/sec              1.04    115.2±1.19µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    104.0±1.55µs        ? ?/sec              1.06    110.2±0.36µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.00     73.0±0.45µs        ? ?/sec              1.08     79.0±0.45µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     61.1±2.24µs        ? ?/sec              1.08     66.0±0.37µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    111.7±6.14µs        ? ?/sec              1.03    115.4±2.41µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.00     73.1±0.39µs        ? ?/sec              1.08     79.3±2.10µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     32.0±0.21µs        ? ?/sec              1.02     32.6±0.45µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.00     32.1±0.33µs        ? ?/sec              1.02     32.6±0.21µs        ? ?/sec
iterate rows                                                                                                                  1.00      3.3±0.01µs        ? ?/sec              1.00      3.3±0.01µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 30, 2025

Looks like an improvement to me when converting strings (though the benchmarks are noisy)

@alamb alamb merged commit 9213ffd into apache:main Dec 30, 2025
13 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 30, 2025

Thanks @rluvaton

@rluvaton rluvaton deleted the improve-row-conversion-for-generic-byte-array branch December 30, 2025 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants