Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: reuse Rows buffer in GroupValuesRows #10980

Merged
merged 1 commit into from
Jun 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions datafusion/physical-plan/src/aggregates/group_values/row.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,12 @@ pub struct GroupValuesRows {
/// [`Row`]: arrow::row::Row
group_values: Option<Rows>,

// buffer to be reused to store hashes
/// reused buffer to store hashes
hashes_buffer: Vec<u64>,

/// reused buffer to store rows
rows_buffer: Rows,

/// Random state for creating hashes
random_state: RandomState,
}
Expand All @@ -78,13 +81,18 @@ impl GroupValuesRows {

let map = RawTable::with_capacity(0);

let starting_rows_capacity = 1000;
let starting_data_capacity = 64 * starting_rows_capacity;
let rows_buffer =
row_converter.empty_rows(starting_rows_capacity, starting_data_capacity);
Ok(Self {
schema,
row_converter,
map,
map_size: 0,
group_values: None,
hashes_buffer: Default::default(),
rows_buffer,
random_state: Default::default(),
})
}
Expand All @@ -93,8 +101,9 @@ impl GroupValuesRows {
impl GroupValues for GroupValuesRows {
fn intern(&mut self, cols: &[ArrayRef], groups: &mut Vec<usize>) -> Result<()> {
// Convert the group keys into the row format
// Avoid reallocation when https://github.com/apache/arrow-rs/issues/4479 is available
let group_rows = self.row_converter.convert_columns(cols)?;
let group_rows = &mut self.rows_buffer;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that this avoids an allocation per batch

group_rows.clear();
self.row_converter.append(group_rows, cols)?;
let n_rows = group_rows.num_rows();

let mut group_values = match self.group_values.take() {
Expand Down Expand Up @@ -150,6 +159,7 @@ impl GroupValues for GroupValuesRows {
self.row_converter.size()
+ group_values_size
+ self.map_size
+ self.rows_buffer.size()
+ self.hashes_buffer.allocated_size()
}

Expand Down
Loading