-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10058: [C++] Improve repeated levels conversion without BMI2 #8320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
See JIRA issue for benchmarks. Would be nice to have benchmarks on other machines. @emkornfield |
|
I also notice that we call |
|
@pitrou I'm devoting most of my bandwidth to try to finish up the parquet read component this week, is it ok if I take a closer look next week (hopefully with enough time before an RC is cut?) |
yeah it isn't ideal, it is possible there is a better factoring in there but it seemed hard to do and isolate BMI2 special instructions, I guess if this isn't too much slower then BMI2 on intel we could potentially collapse everything, but I would not expect that to be the case. |
No problem.
Right. The emulation is probably much slower. |
Use a lookup table to emulate PEXT 5 bits at a time. Remove the slow scalar path.
|
Updated benchmarks on AMD Ryzen: |
cd01f19 to
482797c
Compare
|
sorry some personal issues came up. hope to have time tonight to review this and other parquet related CLs |
|
For the record, if I profile
And |
|
+1. Thanks. |
Use a lookup table to emulate PEXT 5 bits at a time.
Remove the slow scalar path.