-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10015: [Rust] Simd aggregate kernels #8370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10015: [Rust] Simd aggregate kernels #8370
Conversation
|
This is really exciting!! Just to check my understanding, we are talking I will start by reviewing the other branch first. fyi @andygrove @alamb , this may have "some" impact on the query benchmarks of DataFusion. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think this looks really nice. I didn't examine the entire implementation in detail but I can if that is needed. Nice work @jhorstmann
rust/arrow/src/array/array.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious (for my own future edification) if this change actually improves performance or if it was just a code cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, there might have been an issue a bit further down where it previously used the len (in bytes) of the null buffer but after the refactoring needs the length of the array. I don't remember if I did this for consistency or if it caused a test failure.
rust/arrow/src/buffer.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for improved naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder too now that you know lots about what bitwise_bin_op_helper does, if you might be able to add a summary in the comments
Like
/// This function creates a new Buffer view aligned on ....
fn bitwise_bin_op_helper<F>(a6c40a2 to
b3ad057
Compare
|
Rebased after merge of ARROW-10040 (#8262) |
|
I'm planning on running TPC-H benchmarks later today with and without this patch. |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cargo run --release --bin tpch -- --debug --iterations 3 --path /mnt/tpch/s100/parquet --format parquet --query 1 --batch-size 4096 --concurrency 24Master branch:
Query 1 iteration 0 took 14574 ms
Query 1 iteration 1 took 14623 ms
Query 1 iteration 2 took 14985 ms
This PR:
Query 1 iteration 0 took 14253 ms
Query 1 iteration 1 took 14112 ms
Query 1 iteration 2 took 14290 ms
I also verified that the query produced the same results, so LGTM.
|
@andygrove seems there is a lot else going on in that query and the sum aggregation is just a small part. I'll try to setup the benchmarks myself and have a look. Would be interesting to compare a similar query without the grouping, filtering and sorting. |
Built on top of ARROW-10040 (#8262)
Benchmarks (run on a Ryzen 3700U laptop with some thermal problems)
Current master without simd:
This branch without simd (speedup probably due to accessing the data via a slice):
This branch with simd: