-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement #22551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/lz4/lz4/releases/tag/v1.8.3 is already out due to data corruption bug in v1.8.2.
Could you elaborate how to avoid using v1.8.2 in the PR description?
Also, why do you use FilterPushdownBenchmark in this PR? It's mainly based on snappy instead of lz4. So, did you change it to use LZ4 and get the above benchmark?
|
Test build #96602 has finished for PR 22551 at commit
|
|
Thanks @dongjoon-hyun I updated the PR description. |
|
retest this please |
|
Thank you for update, but could you update one more, @wangyum ?
|
|
Test build #96632 has finished for PR 22551 at commit
|
|
IMO, since |
|
WDYT @wangyum ? I think this can go in 3.0 |
|
@maropu |
|
@maropu as far as I can tell, 1.4 was based on lz4 'r128', a couple years old. |
|
Merged to master |
## What changes were proposed in this pull request? This PR upgrade `lz4-java` to 1.5.0 get speed improvement. **General speed improvements** LZ4 decompression speed has always been a strong point. In v1.8.2, this gets even better, as it improves decompression speed by about 10%, thanks in a large part to suggestion from svpv . For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU 3.10GHz, running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0: Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- Decompression speed | 2490 MB/s | 2770 MB/s | +11% Compression speeds also receive a welcomed boost, though improvement is not evenly distributed, with higher levels benefiting quite a lot more. Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- lz4 -1 | 504 MB/s | 516 MB/s | +2% lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10% lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170% More details: https://github.com/lz4/lz4/releases/tag/v1.8.3 **Below is my benchmark result** set `spark.sql.parquet.compression.codec` to `lz4` and disable orc benchmark, then run `FilterPushdownBenchmark`. lz4-java 1.5.0: ``` [success] Total time: 5585 s, completed Sep 26, 2018 5:22:16 PM ``` lz4-java 1.4.0: ``` [success] Total time: 5591 s, completed Sep 26, 2018 5:22:24 PM ``` Some benchmark result: ``` lz4-java 1.5.0 Select 1 row with 500 filters: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 1953 / 1980 0.0 1952502908.0 1.0X Parquet Vectorized (Pushdown) 2541 / 2585 0.0 2541019869.0 0.8X lz4-java 1.4.0 Select 1 row with 500 filters: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 1979 / 2103 0.0 1979328144.0 1.0X Parquet Vectorized (Pushdown) 2596 / 2909 0.0 2596222118.0 0.8X ``` Complete benchmark result: https://issues.apache.org/jira/secure/attachment/12941360/FilterPushdownBenchmark-lz4-java-140-results.txt https://issues.apache.org/jira/secure/attachment/12941361/FilterPushdownBenchmark-lz4-java-150-results.txt ## How was this patch tested? manual tests Closes apache#22551 from wangyum/SPARK-25539. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
This PR upgrade
lz4-javato 1.5.0 get speed improvement.General speed improvements
LZ4 decompression speed has always been a strong point. In v1.8.2, this gets even better, as it improves decompression speed by about 10%, thanks in a large part to suggestion from @svpv .
For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz,
running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0:
Compression speeds also receive a welcomed boost, though improvement is not evenly distributed, with higher levels benefiting quite a lot more.
More details:
https://github.com/lz4/lz4/releases/tag/v1.8.3
Below is my benchmark result
set
spark.sql.parquet.compression.codectolz4and disable orc benchmark, then runFilterPushdownBenchmark.lz4-java 1.5.0:
lz4-java 1.4.0:
Some benchmark result:
Complete benchmark result:
https://issues.apache.org/jira/secure/attachment/12941360/FilterPushdownBenchmark-lz4-java-140-results.txt
https://issues.apache.org/jira/secure/attachment/12941361/FilterPushdownBenchmark-lz4-java-150-results.txt
How was this patch tested?
manual tests