Skip to content

Minor: avoid clone in RunArray row decoding via buffer stealing#9052

Merged
alamb merged 1 commit intoapache:mainfrom
lyang24:decode_buffer_steal
Dec 30, 2025
Merged

Minor: avoid clone in RunArray row decoding via buffer stealing#9052
alamb merged 1 commit intoapache:mainfrom
lyang24:decode_buffer_steal

Conversation

@lyang24
Copy link
Contributor

@lyang24 lyang24 commented Dec 28, 2025

Which issue does this PR close?

its a nitpick to replace
"allocation + memcpy" with "allocation only".

Rationale for this change

remove the value clone in decode path
decoded_values.push(decoded_data.clone())
and taking from decoded_data directly

What changes are included in this PR?

Are these changes tested?

i think the current testing suite will do

Are there any user-facing changes?

no

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 28, 2025
Signed-off-by: lyang24 <lanqingy93@gmail.com>
@lyang24 lyang24 force-pushed the decode_buffer_steal branch from 345d392 to b0a7068 Compare December 28, 2025 11:52
@lyang24 lyang24 changed the title nit: avoid clone in RunArray row decoding via buffer stealing Minor: avoid clone in RunArray row decoding via buffer stealing Dec 28, 2025
Comment on lines +137 to +141
let capacity = decoded_data.capacity();
decoded_values.push(std::mem::replace(
&mut decoded_data,
Vec::with_capacity(capacity),
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we save here, if we replace a clone with vec allocation anyway?

Copy link
Contributor Author

@lyang24 lyang24 Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key is we're NOT replacing "clone with allocation" - we're replacing
"allocation + memcpy" with "allocation only".

Both approaches allocate:

  • .clone() allocates a new Vec AND copies all the data (O(n) in data size)
  • Vec::with_capacity() allocates a new Vec (O(1), just metadata)

What we eliminate is the memcpy operation - specifically sum(unique_values × avg_value_size)
bytes of memory traffic

talk is cheap i will show the code :) i wrote a quick bench and cargo bench it in apple m3 chip

use criterion::*;

/// Benchmark simulating the RunArray decode pattern with .clone()
fn decode_with_clone(data: &[Vec<u8>]) -> Vec<Vec<u8>> {
    let mut decoded_data = Vec::new();
    let mut decoded_values = Vec::new();

    for bytes in data {
        decoded_data.clear();
        decoded_data.extend_from_slice(bytes);

        // Simulate checking if it's a new unique value
        let is_new = decoded_values.is_empty()
            || decoded_data != decoded_values[decoded_values.len() - 1];

        if is_new {
            decoded_values.push(decoded_data.clone());
        }
    }

    decoded_values
}

/// Benchmark simulating the RunArray decode pattern with mem::replace
fn decode_with_replace(data: &[Vec<u8>]) -> Vec<Vec<u8>> {
    let mut decoded_data = Vec::new();
    let mut decoded_values = Vec::new();

    for bytes in data {
        decoded_data.clear();
        decoded_data.extend_from_slice(bytes);

        // Simulate checking if it's a new unique value
        let is_new = decoded_values.is_empty()
            || decoded_data != decoded_values[decoded_values.len() - 1];

        if is_new {
            let capacity = decoded_data.capacity();
            decoded_values.push(std::mem::replace(&mut decoded_data, Vec::with_capacity(capacity)));
        }
    }

    decoded_values
}

fn criterion_benchmark(c: &mut Criterion) {
    // Test with various data sizes to show impact scales with data size
    for size in [64, 256, 1024, 4096] {
        let mut group = c.benchmark_group(format!("run_decode_{}_bytes", size));

        // Generate test data: 1000 rows with 100 unique values (10% uniqueness, typical for RLE)
        let mut data = Vec::new();
        for i in 0..1000 {
            let unique_id = i / 10; // 100 unique values, each repeated 10 times
            let bytes = vec![unique_id as u8; size];
            data.push(bytes);
        }

        group.bench_function("clone", |b| {
            b.iter(|| black_box(decode_with_clone(&data)));
        });

        group.bench_function("mem_replace", |b| {
            b.iter(|| black_box(decode_with_replace(&data)));
        });

        group.finish();
    }
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

result

Data Size Clone (µs) mem::replace (µs) Improvement Time Saved
64 bytes 5.80 5.37 7.4% 0.43 µs
256 bytes 13.66 13.29 2.7% 0.37 µs
1024 bytes 36.49 35.44 2.9% 1.05 µs
4096 bytes 152.54 145.62 4.5% 6.92 µs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great explanation

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lyang24 and @Jefffrey

@alamb alamb merged commit 5ddddbd into apache:main Dec 30, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants