-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] perf: preallocate capacity for ArrayReaderBuilder #9093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
132b247 to
e2b2b8f
Compare
|
from experiment: Pre-allocation overhead may offset the savings from avoiding incremental growth turning this into a draft |
|
Thank you @lyang24 -- I will look at this more carefully shortly |
|
I suspect we will not be able to detect the difference in an end to end test given how small of an overhead the allocation is compared to running the rest of the query |
@lyang24 -- I am not sure that the Maybe we could pass in the batch size to the reader... Edit: one way we could find out would be to put a println to print out the number of rows that was reserved 🤔 |
|
Update: I made a small test program (below) and printed out the capacities diff --git a/parquet/src/arrow/buffer/view_buffer.rs b/parquet/src/arrow/buffer/view_buffer.rs
index 0343047da6..d87b494b46 100644
--- a/parquet/src/arrow/buffer/view_buffer.rs
+++ b/parquet/src/arrow/buffer/view_buffer.rs
@@ -35,6 +35,7 @@ pub struct ViewBuffer {
impl ViewBuffer {
/// Create a new ViewBuffer with capacity for the specified number of views
pub fn with_capacity(capacity: usize) -> Self {
+ println!("Creating ViewBuffer with capacity {}", capacity);
Self {
views: Vec::with_capacity(capacity),
buffers: Vec::new(),Here is what they are: I think those are the number of rows in the dictionary (not the view themselves) I also then printed out the actual capacites diff --git a/parquet/src/arrow/array_reader/byte_view_array.rs b/parquet/src/arrow/array_reader/byte_view_array.rs
index 8e690c574d..3ea6f08a29 100644
--- a/parquet/src/arrow/array_reader/byte_view_array.rs
+++ b/parquet/src/arrow/array_reader/byte_view_array.rs
@@ -259,6 +259,8 @@ impl ByteViewArrayDecoder {
len: usize,
dict: Option<&ViewBuffer>,
) -> Result<usize> {
+ println!("ByteViewArrayDecoder::read called with len {}, current views capacity: {}", len, out.views.capacity());
+
match self {
ByteViewArrayDecoder::Plain(d) => d.read(out, len),
ByteViewArrayDecoder::Dictionary(d) => {You can actually see most of the reads have an empty buffer I tracked it down in a debugger and the default buffer is being created here: Whole Test Progarm
use std::fs::File;
use std::io::{BufReader, Read};
use std::sync::Arc;
use arrow::datatypes::{DataType, Field, FieldRef, Schema};
use parquet::arrow::arrow_reader::{ArrowReaderOptions, ParquetRecordBatchReaderBuilder};
use bytes::Bytes;
use parquet::file::metadata::ParquetMetaDataReader;
fn main() {
let file_name = "/Users/andrewlamb/Downloads/hits/hits_0.parquet";
println!("Opening file: {file_name}", );
let mut file = File::open(file_name).unwrap();
let mut bytes = Vec::new();
file.read_to_end(&mut bytes).unwrap();
let bytes = Bytes::from(bytes);
let schema = string_to_view_types(ParquetRecordBatchReaderBuilder::try_new(bytes.clone()).unwrap().schema());
//println!("Schema: {:?}", schema);
let options = ArrowReaderOptions::new()
.with_schema(schema);
let reader = ParquetRecordBatchReaderBuilder::try_new_with_options(bytes, options).unwrap()
.with_batch_size(8192)
.build().unwrap();
for batch in reader {
let batch = batch.unwrap();
println!("Read batch with {} rows and {} columns", batch.num_rows(), batch.num_columns());
}
println!("Done");
}
// Hack because the clickbench files were written with the wrong logical type for strings
fn string_to_view_types(schema: &Arc<Schema>) -> Arc<Schema> {
let fields: Vec<FieldRef> = schema
.fields()
.iter()
.map(|field| {
let existing_type = field.data_type();
if existing_type == &DataType::Utf8 || existing_type == &DataType::Binary {
Arc::new(Field::new(
field.name(),
DataType::Utf8View,
field.is_nullable(),
))
} else {
Arc::clone(field)
}
})
.collect();
Arc::new(Schema::new(fields))
} |
|
So, TLDR is my analyis is that we aren't properly sizing the allocations. The good news is we can fix this. The bad news is it may be tricky. I will update the original ticket too |
57a2d80 to
224798a
Compare
Thanks for the deep dive, i think passing down a capacity hint from ArrayReaderBuilder is the right way to go. I made a impl attempt - and results looks promising.
with predicate pushdown the sql runs so fast (7ms) the difference become hard to tell |
224798a to
72f35ad
Compare
72f35ad to
a4f04e7
Compare
|
Nice -- checking it out |
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lyang24 -- this looks very nice. I kicked off some automated benchmarks and hopefully we can see the benefits reflected there (though the benchmarks are sometimes pretty noisy)
If this does work out, I would like to spend a bit more time trying to get this mechanism to work by removing Default from ValuesBuffer to ensure we got all the cases and that any future array readers work the same
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
|
Rerunning the benchmarks to see if we can see a consistent pattern |
9cffc3c to
bb9eb2e
Compare
|
🤖: Benchmark completed Details
|
|
🤖 |
parquet/benches/arrow_reader.rs
Outdated
| use half::f16; | ||
| use num_bigint::BigInt; | ||
| use num_traits::FromPrimitive; | ||
| use parquet::arrow::arrow_reader::DEFAULT_BATCH_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I made the bench using 1024 default batch size
bb9eb2e to
f916120
Compare
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
…er.Ensure internal buffers to be pre-allocated. Api change - making batch size required for ArrayReader and buffers.
f916120 to
49b3244
Compare
|
its looks like its doing well with large scan (full table scan) querys some regressions with high selectivity regression with - i am guessing preallocate large blocks messes up cpu cache? maybe we need to do this conditionally - do not preallocate for Plain numeric ( |
Which issue does this PR close?
Rationale for this change
reduce allocation cost mentioned in #9059 from experiment: Pre-allocation overhead may offset the savings from avoiding incremental growth
What changes are included in this PR?
pre allocate view vectors with size.
Are these changes tested?
I did a small benchmark with
clickbench query 10 and a smaller set (1%) of clickbench data
download data create query 10
run benchmark to get baseline.
patching datafusion with local arrow
run benchmark
results i think there are no regression at least - was hoping someone with large internet pipe to bench this on full hits table
baseline.json
patched.json
update: with full hits table (no regression)
baseline_full.json
patched_full.json
due to the date size the query file only contain 1 copy of the query 10 query.
baseline
patched
Are there any user-facing changes?
no