Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions parquet/src/column/writer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1073,6 +1073,7 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> {
if let Some(ref mut cmpr) = self.compressor {
let mut compressed_buf = Vec::with_capacity(uncompressed_size);
cmpr.compress(&buffer[..], &mut compressed_buf)?;
compressed_buf.shrink_to_fit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should apply the same optimization to V2 path below 🤔

Also, @mapleFU recently updated the compression check for V2 pages to use the uncompressed values if the compression didn't actually reduce the space. Maybe we should apply that to V1 pages too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of copy is pretty insignifiant because memcpy speed it's around 10000MB/s and compression speed it's around 600MB/s. Underlayer vector use shink method https://doc.rust-lang.org/alloc/alloc/trait.Allocator.html#method.shrink. In standard malloc threadhold for switch to mmap allocation it's 128k and for shrink the system only unmap page and no need memory copy.

In V2 page buffer is not reserved

For no compress page when compression it's bad i can be a good idea to apply for V1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer = compressed_buf;
}

Expand Down
Loading