[Variant] Avoid extra buffer allocation in ListBuilder #7987

klion26 · 2025-07-24T03:46:25Z

This commit will reuse parent buffer for ListBuilder, so that it doesn't need to copy the buffer when finishing the builder.

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Closes [Variant] Avoid extra allocation in list builder #7977 .

Rationale for this change

This pr wants to avoid the extra buffer allocation in ListBuilder.

What changes are included in this PR?

Reuse the parent's buffer when creating a ListBuilder, all contents will be written to the buffer of the parent directly
When ListBuilder::finish, we'll fill the header for the current list in the parent's buffer
Will roll back the value of has written into the parent's buffer in drop if ListBuilder::finish has not been called.

Are these changes tested?

The change was covered by existing tests mainly test_nested_list_with_heterogeneous_fields_for_buffer_reuse

Are there any user-facing changes?

No

klion26 · 2025-07-24T04:02:25Z

@alamb @scovich @viirya, please help review this when you're free, thanks.

I've created benchmarks for various implementations. The current implementation is the winner, and the alternatives are

Current implementation with PackedU32Iterator
Splice with Iterator (code here)
Collect the header with iterator before splice(code here)
Splice with actual header bytes (code here)

The benchmark comparison result from my laptop

The steps are

Created all four branches with modifications

run the command cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels -- --save-baseline $BRANCH_NAME on each branch

run the command critcmp main $BRANCH_NAME to get the compare result

1 PackedU32 Iterator

group                                                                7977_packedu32_iterator                main
-----                                                                -----------------------                ----
batch_json_string_to_variant json_list 8k string                     1.00     41.7±5.53ms        ? ?/sec    1.22     51.0±7.14ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00   414.0±41.45ms        ? ?/sec    1.11   458.7±48.08ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00     15.7±2.04ms        ? ?/sec    1.01     15.9±1.67ms        ? ?/sec
variant_get_primitive                                                1.09      2.7±0.34ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec

2 Splice with `Iterator`

group                                                                7977_avoid_allocation_for_list_builder    main
-----                                                                --------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     46.7±6.23ms        ? ?/sec       1.09     51.0±7.14ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00   418.0±42.38ms        ? ?/sec       1.10   458.7±48.08ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00     15.9±1.97ms        ? ?/sec       1.00     15.9±1.67ms        ? ?/sec
variant_get_primitive                                                1.01      2.5±0.28ms        ? ?/sec       1.00      2.5±0.28ms        ? ?/sec

3 Collect the header with the iterator before splice

group                                                                7977_collect_before_splice             main
-----                                                                --------------------------             ----
batch_json_string_to_variant json_list 8k string                     1.00     46.4±4.60ms        ? ?/sec    1.10     51.0±7.14ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00   424.5±43.27ms        ? ?/sec    1.08   458.7±48.08ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00     15.9±1.83ms        ? ?/sec    1.00     15.9±1.67ms        ? ?/sec
variant_get_primitive                                                1.02      2.5±0.31ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec

4 Splice with actual header bytes

group                                                                7977_fill_before_splice                main
-----                                                                -----------------------                ----
batch_json_string_to_variant json_list 8k string                     1.00     45.1±2.68ms        ? ?/sec    1.13     51.0±7.14ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00   419.6±40.92ms        ? ?/sec    1.09   458.7±48.08ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.04     16.5±1.20ms        ? ?/sec    1.00     15.9±1.67ms        ? ?/sec
variant_get_primitive                                                1.12      2.8±0.26ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec

klion26 · 2025-07-24T04:03:31Z

parquet-variant/src/builder.rs

Verified that the drop for ListBuilder was covered with cargo llvm-cov --html test -p parquet-variant

klion26 · 2025-07-24T05:07:30Z

parquet-variant/src/builder.rs

Add clone() here to make compile happy, or the compile will throw cannot move out of type ListBuilder<'_>, which implements the Drop trait

Just do self.offsets.iter().map(|&offset| ...), relying on the fact that u32 is Copy -- instead of cloning the whole vec?

Or,

let offsets = std::mem::take(self.offsets).into_iter(); let offsets = offsets.map(|offset| (offset as u32).to_le_bytes()); let offsets = Packedu32Iterator::new(offset_size as usize, offsets);

This commit will reuse parent buffer for ListBuilder, so that it doesn't need to copy the buffer when finishing the builder.

scovich

Very nice!

I've created benchmarks for various implementations.

Do the benchmarks cover different offset sizes, is_large true/false, etc? Or are they always the same offset size?

The current implementation is the winner, and the alternatives are

Current implementation with PackedU32Iterator

This one unnecessarily clones the offsets array; based on the other benchmark results, I would expect removing that to speed up the runs by ~4ms.

Splice with Iterator (code here)

This one will perform poorly because the chained iterator doesn't infer an accurate lower bound, so Vec::splice has to shift bytes twice (once to fit the lower bound, and again to fix the remainder).

Collect the header with iterator before splice(code here)

No reason to expect this would be faster than 2/, because it allocates and immediately consumes an extra Vec

Splice with actual header bytes (code here)

This is still iterator-based like 1/, but with all the unsafety of indexing into a pre-allocated temp buffer (and the overhead of allocating said temp buffer).

A fifth approach would be to use the packed u32 iterator from 1/, and splice in a pre-populated temp buffer like 5/, but to populate the temp buffer by push+extend calls instead of chain+collect:

let mut bytes_to_splice = vec![header];
  ...
bytes_to_splice.extend(num_elements_bytes);
  ...
bytes_to_splice.extend(offsets);
  ...
bytes_to_splice.extend(data_size_bytes);
buffer
    .inner_mut()
    .splice(starting_offset..starting_offset, bytes_to_splice);

I would expect that to outperform 5/ and possibly match 1/, but not necessarily outperform clone-free 1/.

A sixth approach would also use a pre-populated temp buffer, but ditch the packed u32 iterator from 1/ and just directly append the bytes:

fn append_packed_u32(dest: &mut Vec<u8>, value: u32, value_bytes: usize) {
    let n = dest.len() + value_bytes;
    dest.extend(value.to_le_bytes());
    dest.truncate(n);
}

// Calculated header size becomes a hint; being wrong only risks extra allocations.
// Make sure to reserve enough capacity to handle the extra bytes we'll truncate.
let mut bytes_to_splice = Vec::with_capacity(header_size + 3);
bytes_to_splice.push(header);

append_packed_u32(&mut bytes_to_splice, num_elements, if is_large { 4 } else { 1 });

for offset in std::mem::take(self.offsets) {
    append_packed_u32(&mut bytes_to_splice, offset as u32, offset_size as usize);
}

append_packed_u32(&mut bytes_to_splice, data_size as u32, offset_size as usize);

buffer
    .inner_mut()
    .splice(starting_offset..starting_offset, bytes_to_splice);

This one should be a lot faster than a chained iterator (and works equally well regardless of how many bytes we pack to), but pays for the extra temp buffer allocation. I suspect it will be faster than even optimized 1/, but the extra allocation may prove too expensive.

scovich · 2025-07-24T20:22:48Z

parquet-variant/src/builder.rs

+            let next_item = self.iterator.next()?;
+            self.current_item = next_item;


Split into two statements due to lifetime issues, I suppose?

scovich · 2025-07-24T20:27:58Z

parquet-variant/src/builder.rs

Just do self.offsets.iter().map(|&offset| ...), relying on the fact that u32 is Copy -- instead of cloning the whole vec?

scovich · 2025-07-24T20:31:12Z

parquet-variant/src/builder.rs

Or,

let offsets = std::mem::take(self.offsets).into_iter(); let offsets = offsets.map(|offset| (offset as u32).to_le_bytes()); let offsets = Packedu32Iterator::new(offset_size as usize, offsets);

klion26 · 2025-07-25T15:02:42Z

@scovich thanks for the detailed review and suggestion.

Do the benchmarks cover different offset sizes, is_large true/false, etc? Or are they always the same offset size?

The benchmark contains vary lenght of lists, but all the lenght less than 255 -- is_large is always false

I've run the following benchmarks

previous 1 with clone-free
Fifth approach
Sixth approach

The results show that the sixthis better. I've updated the implementation to the sixth approach.

The steps to generate the result:

Change the previous 1 approach to clone-free

Code fifth approach

Code sixth approach

Execute the command cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels -- --save-baseline $BRANCH_NAME for the three approaches and the main branch, one by one.

execute the command critcmp main ${BRANCH_NAME} for the three approaches.

pervious `1` with clone-free

group                                                                7977_avoid_extra_buffer_with_packedu32_iterator    main
-----                                                                -----------------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     33.5±1.17ms        ? ?/sec                1.10     36.8±1.49ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    338.9±7.71ms        ? ?/sec                1.10    373.5±8.07ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.02     13.6±0.45ms        ? ?/sec                1.00     13.3±0.47ms        ? ?/sec
variant_get_primitive                                                1.00      2.1±0.07ms        ? ?/sec                1.01      2.1±0.07ms        ? ?/sec

fifth approach

group                                                                7977_pre_populate_with_push_exten      main
-----                                                                ---------------------------------      ----
batch_json_string_to_variant json_list 8k string                     1.00     35.8±1.52ms        ? ?/sec    1.03     36.8±1.49ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    342.9±9.65ms        ? ?/sec    1.09    373.5±8.07ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00     13.1±0.46ms        ? ?/sec    1.01     13.3±0.47ms        ? ?/sec
variant_get_primitive                                                1.00      2.1±0.07ms        ? ?/sec    1.00      2.1±0.07ms        ? ?/sec

code for fifth approach

    let header = array_header(is_large, offset_size);

    let mut bytes_to_splice = vec![header];
    let num_elements_bytes =
        num_elements
            .to_le_bytes()
            .into_iter()
            .take(if is_large { 4 } else { 1 });
    bytes_to_splice.extend(num_elements_bytes);
    let offsets = PackedU32Iterator::new(
        offset_size as usize,
        self.offsets
            .iter()
            .map(|&offset| (offset as u32).to_le_bytes()),
    );
    bytes_to_splice.extend(offsets);
    let data_size_bytes = data_size
        .to_le_bytes()
        .into_iter()
        .take(offset_size as usize);
    bytes_to_splice.extend(data_size_bytes);

    buffer
        .inner_mut()
        .splice(starting_offset..starting_offset, bytes_to_splice);

sixth approach

group                                                                7977_pre_populate_with_directly_append_bytes    main
-----                                                                --------------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     33.7±1.21ms        ? ?/sec             1.09     36.8±1.49ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    333.4±7.89ms        ? ?/sec             1.12    373.5±8.07ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00     13.2±0.46ms        ? ?/sec             1.01     13.3±0.47ms        ? ?/sec
variant_get_primitive                                                1.00      2.1±0.08ms        ? ?/sec             1.00      2.1±0.07ms        ? ?/sec```

alamb · 2025-07-25T17:42:42Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing 7977_avoid_extra_buffer_with_packedu32_iterator (bec3ba8) to ec81db3 diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=7977_avoid_extra_buffer_with_packedu32_iterator
Results will be posted here when complete

alamb

Thanks @klion26 -- this is looking very nice.

I had a question about the use of splice vs just shifting the vec over and appending the bytes. However, I think this PR is already an improvement over what is on main so we could also merge it as is an revisit the allocations

I also kicked off the benchmarks and hopefully we'll see some good results

alamb · 2025-07-25T17:35:02Z

parquet-variant/src/builder.rs

+    parent_value_offset_base: usize,
+    /// The starting offset in the parent's metadata buffer where this list starts
+    /// used to truncate the written fields in `drop` if the current list has not been finished
+    parent_metadata_offset_base: usize,


this is a good idea

alamb · 2025-07-25T17:36:26Z

parquet-variant/src/builder.rs

-        let data_size = self.buffer.offset();
+        let buffer = self.parent_state.buffer();
+
+        let data_size = buffer.offset() - self.parent_value_offset_base;


should we do a checked sub here to avoid underflow? An underflow would only happen with a bug in the implementation so this is probably fine

alamb · 2025-07-25T17:40:05Z

parquet-variant/src/builder.rs

        let metadata = VariantMetadata::try_new(&metadata).unwrap();
-        assert_eq!(metadata.len(), 1);
-        assert_eq!(&metadata[0], "name"); // not rolled back
+        assert!(metadata.is_empty()); // rolled back


alamb · 2025-07-25T17:47:07Z

🤖: Benchmark completed

Details

group                                                                7977_avoid_extra_buffer_with_packedu32_iterator    main
-----                                                                -----------------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     28.3±0.13ms        ? ?/sec                1.02     28.8±0.10ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    335.7±1.21ms        ? ?/sec                1.09    366.4±4.37ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00      8.2±0.02ms        ? ?/sec                1.00      8.2±0.02ms        ? ?/sec
variant_get_primitive                                                1.00   1359.2±2.98µs        ? ?/sec                1.03   1396.4±3.18µs        ? ?/sec

scovich

Looks great!

The results show that the sixthis better. I've updated the implementation to the sixth approach.

Maybe we should update the PR description as well?

I had a question about the use of splice vs just shifting the vec over and appending the bytes

What do you mean by "shifting" and "appending" sorry? The buffer already contains the value bytes by the time we know the header info, so AFAIK we only have three choices:

Guess (correctly!) beforehand how many header bytes are needed, and allocate space for them before appending the value bytes. Error-prone unless splice is used to replace the pre-allocated space with the actual header bytes.
Directly splice in the header bytes (what this PR does). Safe, but still has to shift bytes over.
Splice in a zero-byte region of the correct size to shift the bytes, and then loop back over in order to populate the region. Error-prone but doesn't need a temp vector.

Were you referring to one of the above? Or something else?

scovich · 2025-07-25T23:58:01Z

parquet-variant/src/builder.rs


+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large
+            (self.offsets.len() + 1) * offset_size as usize; // offsets and data size


Suggested change

(self.offsets.len() + 1) * offset_size as usize; // offsets and data size

(num_elements + 1) * offset_size as usize; // offsets and data size

scovich · 2025-07-26T00:03:03Z

parquet-variant/src/builder.rs

+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large


Suggested change

let header_size = 1 + // header

if is_large { 4 } else { 1 } + // is_large

let num_elements_size = if is_large { 4 } else { 1 }

let header_size = 1 + // header

num_elements_size + // num_elements

(and then can reuse num_elements_size below)

scovich · 2025-07-26T00:04:33Z

parquet-variant/src/builder.rs

+        append_packed_u32(
+            &mut bytes_to_splice,
+            num_elements as u32,
+            if is_large { 4 } else { 1 },


Suggested change

if is_large { 4 } else { 1 },

num_elements_size,

scovich · 2025-07-26T00:08:25Z

parquet-variant/src/builder.rs

+            parent_value_offset_base: offset_base,
+            has_been_finished: false,
+            parent_metadata_offset_base: meta_offset_base,


If we're anyway doing :, why not just fold in the logic directly?

Suggested change

parent_value_offset_base: offset_base,

has_been_finished: false,

parent_metadata_offset_base: meta_offset_base,

parent_value_offset_base: parent_state.buffer_current_offset(),

has_been_finished: false,

parent_metadata_offset_base: parent_state.metadata_current_offset(),

Alternatively, the let above could give the correct name from the start, so it can just be passed directly:

Suggested change

parent_value_offset_base: offset_base,

has_been_finished: false,

parent_metadata_offset_base: meta_offset_base,

parent_value_offset_base,

has_been_finished: false,

parent_metadata_offset_base,

Has changed the local variable name, the current implementation aims to make the compiler happy, as parent_state has been moved before(the first parameter).

viirya · 2025-07-26T06:21:02Z

parquet-variant/src/builder.rs

    buf[start_pos..start_pos + nbytes as usize].copy_from_slice(&bytes[..nbytes as usize]);
 }

+/// Append `value_bytes` of given `value` into `dest`.


value_bytes is the byte width of the value?

Suggested change

/// Append `value_bytes` of given `value` into `dest`.

/// Append `value_bytes` bytes of given `value` into `dest`.

Or we could just call it value_size like most of the other parts of the code do?

viirya · 2025-07-26T06:36:37Z

parquet-variant/src/builder.rs

+        // Calculated header size becomes a hint; being wrong only risks extra allocations.
+        // Make sure to reserve enough capacity to handle the extra bytes we'll truncate.


Hmm, can we rephrase the comment, I don't quite get what it means. Do you try to say the header_size is just a hint and we will allocate extra space?

When header_size will be incorrect?

When header_size will be incorrect?

The size if calculated separately, and then the actual bytes are appended. That opens up a bug surface -- any time the two disagree, header_size will be wrong. If the code directly relied on the size being correct, e.g. because we allocate that many bytes and then index them, we could have produce a bad variant value (either because there's an extra run of inserted bytes, or because of a buffer overflow while indexing. But because the calculated size is only a capacity hint for the vec, the cost of being wrong is very low.

viirya · 2025-07-26T06:37:09Z

parquet-variant/src/builder.rs

+        let starting_offset = self.parent_value_offset_base;
+
+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large


Suggested change

if is_large { 4 } else { 1 } + // is_large

if is_large { 4 } else { 1 } + // is_large: 4 bytes, else 1 byte.

viirya · 2025-07-26T06:37:32Z

parquet-variant/src/builder.rs

-        let starting_offset = parent_buffer.offset();
+        let starting_offset = self.parent_value_offset_base;
+
+        let header_size = 1 +      // header


Suggested change

let header_size = 1 + // header

let header_size = 1 + // header (i.e., `array_header`)

alamb · 2025-07-26T12:43:23Z

What do you mean by "shifting" and "appending" sorry? The buffer already contains the value bytes by the time we know the header info, so AFAIK we only have three choices:

Guess (correctly!) beforehand how many header bytes are needed, and allocate space for them before appending the value bytes. Error-prone unless splice is used to replace the pre-allocated space with the actual header bytes.

Directly splice in the header bytes (what this PR does). Safe, but still has to shift bytes over.

Splice in a zero-byte region of the correct size to shift the bytes, and then loop back over in order to populate the region. Error-prone but doesn't need a temp vector.

Were you referring to one of the above? Or something else?

I meant 3. specifically, https://github.com/apache/arrow-rs/pull/7987/files#diff-19c7b0b0d73ef11489af7932f49046a19ec7790896a8960add5a3ded21d5657aR1230 ( I thought I left a specific comment about this but I can't find it now 🤔 )

Basically rather than allocating a new temporary vector to create the header and then splicing those bytes in like

        let mut bytes_to_splice = Vec::with_capacity(header_size + 3);
        // .... build header
        // slice 
        buffer
            .inner_mut()
            .splice(starting_offset..starting_offset, bytes_to_splice);

I meant avoiding that allocation by shifting the byes over in one go and then writing directly into the output buffer:

        // insert header_size bytes into the output, shifting existing bytes down
        buffer.splice(starting_offset..starting_offset+header_length, std::iter::repeat(0));
       // write header directly into buffer[starting_offset], buffer[starting_offset+1], etc

This looks somewhat similar to what @klion26 did in

Splice with Iterator (code here)

Though in that example the header is created during the insertion

My suggestion is to merge this PR as is and then we can fiddle around with potentially other optimizations as a follow on PR

alamb · 2025-07-26T12:46:25Z

@klion26 it looks like there are some good suggestions from @viirya and @scovich -- so I will wait to merge this PR until you have a chance to review them. I think it would be fine to either

merge this PR as is and address suggestions as a follow on
address the suggestions directly before merging

Please let us know what you prefer

scovich · 2025-07-26T18:04:29Z

I meant avoiding that allocation by shifting the byes over in one go and then writing directly into the output buffer:

        // insert header_size bytes into the output, shifting existing bytes down
        buffer.splice(starting_offset..starting_offset+header_length, std::iter::repeat(0));
       // write header directly into buffer[starting_offset], buffer[starting_offset+1], etc

Yeah, IIRC that was the original approach, but I had cautioned that calculating the splice size incorrectly would cause subsequent indexing to corrupt the variant (either by leaving unused zeros or by overflowing the spliced region). And since the original approach was anyway using vec![0u8; header_length] as the source of zeros, I suggested to populate the vec directly instead. Safe and not more expensive.

It could be that not allocating the temp buffer at all does improve performance even further, tho it would come with the risk of corruption if the spliced region were ever the wrong size.

alamb · 2025-07-27T10:43:45Z

t could be that not allocating the temp buffer at all does improve performance even further, tho it would come with the risk of corruption if the spliced region were ever the wrong size.

I agree

So I think we should proceed with the approach in this PR and then we can go with the even fewer allocations approach if we can get some benchmarks that show it makes any measurable difference

klion26

@alamb @scovich @viirya Thanks for the review. I've addressed the comments. Sorry for the late response—I was out yesterday and am just back today.

klion26 · 2025-07-27T12:09:12Z

parquet-variant/src/builder.rs

+            parent_value_offset_base: offset_base,
+            has_been_finished: false,
+            parent_metadata_offset_base: meta_offset_base,


Has changed the local variable name, the current implementation aims to make the compiler happy, as parent_state has been moved before(the first parameter).

klion26 · 2025-07-27T12:13:09Z

parquet-variant/src/builder.rs

-        let data_size = self.buffer.offset();
+        let buffer = self.parent_state.buffer();
+
+        let data_size = buffer.offset() - self.parent_value_offset_base;


klion26 · 2025-07-27T12:13:15Z

parquet-variant/src/builder.rs

    buf[start_pos..start_pos + nbytes as usize].copy_from_slice(&bytes[..nbytes as usize]);
 }

+/// Append `value_bytes` of given `value` into `dest`.


klion26 · 2025-07-27T12:14:47Z

parquet-variant/src/builder.rs

+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large


klion26 · 2025-07-27T12:14:52Z

parquet-variant/src/builder.rs

-        let starting_offset = parent_buffer.offset();
+        let starting_offset = self.parent_value_offset_base;
+
+        let header_size = 1 +      // header


klion26 · 2025-07-27T12:15:16Z

parquet-variant/src/builder.rs

+        let starting_offset = self.parent_value_offset_base;
+
+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large


klion26 · 2025-07-27T12:21:41Z

parquet-variant/src/builder.rs


+        let header_size = 1 +      // header
+            if is_large { 4 } else { 1 } +  // is_large
+            (self.offsets.len() + 1) * offset_size as usize; // offsets and data size


klion26 · 2025-07-27T12:21:50Z

parquet-variant/src/builder.rs

+        append_packed_u32(
+            &mut bytes_to_splice,
+            num_elements as u32,
+            if is_large { 4 } else { 1 },


alamb · 2025-07-28T15:01:21Z

Thanks again @klion26 @scovich @viirya 🚀

klion26 · 2025-07-29T09:05:00Z

@alamb @scovich @viirya thanks very much for the review and merging!

github-actions bot added the parquet Changes to the parquet crate label Jul 24, 2025

klion26 commented Jul 24, 2025

View reviewed changes

klion26 force-pushed the 7977_avoid_extra_buffer_with_packedu32_iterator branch from c209728 to 51e3fa9 Compare July 24, 2025 05:04

klion26 commented Jul 24, 2025

View reviewed changes

[Variant] Avoid extra buffer allocation in ListBuilder

e4603c1

This commit will reuse parent buffer for ListBuilder, so that it doesn't need to copy the buffer when finishing the builder.

klion26 force-pushed the 7977_avoid_extra_buffer_with_packedu32_iterator branch from 51e3fa9 to e4603c1 Compare July 24, 2025 05:18

scovich reviewed Jul 24, 2025

View reviewed changes

klion26 added 2 commits July 25, 2025 08:29

remove unnecessary clone

66f6ac4

change to fill buff with directly append before splice

bec3ba8

alamb approved these changes Jul 25, 2025

View reviewed changes

scovich approved these changes Jul 26, 2025

View reviewed changes

viirya reviewed Jul 26, 2025

View reviewed changes

viirya approved these changes Jul 26, 2025

View reviewed changes

address comments

3947aa8

klion26 commented Jul 27, 2025

View reviewed changes

alamb merged commit 73c3e97 into apache:main Jul 28, 2025
12 checks passed

alamb mentioned this pull request Jul 28, 2025

[Variant] Avoid extra allocation in list builder #7977

Closed

klion26 deleted the 7977_avoid_extra_buffer_with_packedu32_iterator branch July 29, 2025 06:48

klion26 restored the 7977_avoid_extra_buffer_with_packedu32_iterator branch July 29, 2025 09:38

klion26 mentioned this pull request Jul 31, 2025

[Variant] Optimize the object header generation logic in ObjectBuilder::finish #8031

Merged

		let next_item = self.iterator.next()?;
		self.current_item = next_item;

	(self.offsets.len() + 1) * offset_size as usize; // offsets and data size
	(num_elements + 1) * offset_size as usize; // offsets and data size

		let header_size = 1 + // header
		if is_large { 4 } else { 1 } + // is_large

	/// Append `value_bytes` of given `value` into `dest`.
	/// Append `value_bytes` bytes of given `value` into `dest`.

		// Calculated header size becomes a hint; being wrong only risks extra allocations.
		// Make sure to reserve enough capacity to handle the extra bytes we'll truncate.

	let header_size = 1 + // header
	let header_size = 1 + // header (i.e., `array_header`)

[Variant] Avoid extra buffer allocation in ListBuilder #7987

[Variant] Avoid extra buffer allocation in ListBuilder #7987

Uh oh!

Conversation

klion26 commented Jul 24, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

klion26 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 PackedU32 Iterator

2 Splice with Iterator

3 Collect the header with the iterator before splice

4 Splice with actual header bytes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klion26 commented Jul 25, 2025

pervious 1 with clone-free

fifth approach

sixth approach

Uh oh!

alamb commented Jul 25, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jul 25, 2025

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

klion26 commented Jul 24, 2025 •

edited

Loading

2 Splice with `Iterator`

scovich left a comment •

edited

Loading

pervious `1` with clone-free

alamb commented Jul 26, 2025 •

edited

Loading