-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-52078][TEST] Add ZStandardTPCDSDataBenchmark #50857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # To prevent spark.test.home not being set. See more detail in SPARK-36007. | ||
| SPARK_HOME: ${{ github.workspace }} | ||
| SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1 | ||
| SPARK_TPCDS_DATA_TEXT: ${{ github.workspace }}/tpcds-sf-1-text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the output data size
total 1.2G
-rw-rw-r-- 1 chengpan chengpan 1.9K May 12 13:58 call_center.dat
-rw-rw-r-- 1 chengpan chengpan 1.6M May 12 13:58 catalog_page.dat
-rw-rw-r-- 1 chengpan chengpan 21M May 12 13:58 catalog_returns.dat
-rw-rw-r-- 1 chengpan chengpan 283M May 12 13:58 catalog_sales.dat -- used for benchmark
-rw-rw-r-- 1 chengpan chengpan 13M May 12 13:58 customer.dat
-rw-rw-r-- 1 chengpan chengpan 5.3M May 12 13:58 customer_address.dat
-rw-rw-r-- 1 chengpan chengpan 77M May 12 13:58 customer_demographics.dat
-rw-rw-r-- 1 chengpan chengpan 9.9M May 12 13:58 date_dim.dat
-rw-rw-r-- 1 chengpan chengpan 67 May 12 13:58 dbgen_version.dat
-rw-rw-r-- 1 chengpan chengpan 149K May 12 13:58 household_demographics.dat
-rw-rw-r-- 1 chengpan chengpan 328 May 12 13:58 income_band.dat
-rw-rw-r-- 1 chengpan chengpan 226M May 12 13:58 inventory.dat
-rw-rw-r-- 1 chengpan chengpan 4.9M May 12 13:58 item.dat
-rw-rw-r-- 1 chengpan chengpan 37K May 12 13:58 promotion.dat
-rw-rw-r-- 1 chengpan chengpan 1.4K May 12 13:58 reason.dat
-rw-rw-r-- 1 chengpan chengpan 1.1K May 12 13:58 ship_mode.dat
-rw-rw-r-- 1 chengpan chengpan 3.1K May 12 13:58 store.dat
-rw-rw-r-- 1 chengpan chengpan 32M May 12 13:58 store_returns.dat
-rw-rw-r-- 1 chengpan chengpan 371M May 12 13:58 store_sales.dat
-rw-rw-r-- 1 chengpan chengpan 4.9M May 12 13:58 time_dim.dat
-rw-rw-r-- 1 chengpan chengpan 585 May 12 13:58 warehouse.dat
-rw-rw-r-- 1 chengpan chengpan 5.7K May 12 13:58 web_page.dat
-rw-rw-r-- 1 chengpan chengpan 9.4M May 12 13:58 web_returns.dat
-rw-rw-r-- 1 chengpan chengpan 141M May 12 13:58 web_sales.dat
-rw-rw-r-- 1 chengpan chengpan 8.6K May 12 13:58 web_site.dat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luben, in this round, I'm trying to use TPCDS-generated data for the zstd compression benchmark.
The data can be generated by the following steps:
- follow https://github.com/databricks/tpcds-kit to build
mkdir -p tpcds-sf-1-texttpcds-kit/tools/dsdgen -DISTRIBUTIONS tpcds-kit/tools/tpcds.idx -SCALE 1 -DIR tpcds-sf-1-text
And my local test shows that zstd-jni 1.5.6 and 1.5.7 are basically at the same level, and 1.5.7 is a little bit faster in some cases.
|
LGTM! Yes, using real world data should also give us more meaningful results if there is actual regression. |
| path: ./tpcds-sf-1 | ||
| path: | | ||
| ./tpcds-sf-1 | ||
| ./tpcds-sf-1-text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the new testing approach, but is it possible to produce tpcds-sf-1-text only when conducting tests on ZStandardBenchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep it to make the workflow definition simple, as it only takes a few seconds to generate the dataset
$ time tpcds-kit/tools/dsdgen -DISTRIBUTIONS tpcds-kit/tools/tpcds.idx -SCALE 1 -DIR ~/tpcds-sf-1-text
dsdgen Population Generator (Version 2.13.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2020
Warning: This scale factor is valid for QUALIFICATION ONLY
tpcds-kit/tools/dsdgen -DISTRIBUTIONS tpcds-kit/tools/tpcds.idx -SCALE 1 -DIR 7.86s user 0.57s system 99% cpu 8.467 total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @pan3793 and @luben . As the author of ZStandardBenchmark.scala, I'd like to suggest to create a new test suite instead of erasing the existing benchmark. We can use both benchmarks.
We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in #50057, and the author suggests using real-world data for zstd compression benchmark.
|
@dongjoon-hyun , what value the existing benchmark will provide if we already know that it is flawed and doesn't represent real-world performance? |
To @luben , we are very conservative. The benchmark file and the result itself has a value to compare with old Spark because I don't think the whole benchmark is lying on numbers. There is no reason to destroy the community assets while we want to add new contributions in a new way. In the same way, I believe the proposed new benchmark way is also supposed to be evaluated over multiple generations of ZSTD library and Spark distributions. I don't believe an idea which is a perfect (flawless) code. Do you? |
|
@dongjoon-hyun okay, I will follow you suggestion to create a new benchmark |
|
Thank you. I guess you can simply rename this PR's one into a new name first and recover the old file, @pan3793 . |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @pan3793 .
|
Merged into master. Thanks @pan3793 @dongjoon-hyun and @luben |
### What changes were proposed in this pull request? We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in apache#50057, and the author suggests using real-world data for zstd compression benchmark. ### Why are the changes needed? Add a new benchmark for zstd with more reasonable data. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU 3.00GHz zstd-jni:1.5.6-10 ``` ================================================================================================ Benchmark ZStandardCompressionCodec ================================================================================================ OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------- Compression 4 times at level 1 without buffer pool 2737 2742 6 0.0 684299199.3 1.0X Compression 4 times at level 2 without buffer pool 4217 4218 2 0.0 1054165072.5 0.6X Compression 4 times at level 3 without buffer pool 5660 5661 2 0.0 1414928809.8 0.5X Compression 4 times at level 1 with buffer pool 2739 2743 6 0.0 684719746.2 1.0X Compression 4 times at level 2 with buffer pool 4186 4191 8 0.0 1046477235.5 0.7X Compression 4 times at level 3 with buffer pool 5663 5667 5 0.0 1415762083.2 0.5X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------- Decompression 4 times from level 1 without buffer pool 943 950 10 0.0 235749387.0 1.0X Decompression 4 times from level 2 without buffer pool 1239 1244 6 0.0 309753079.0 0.8X Decompression 4 times from level 3 without buffer pool 1468 1484 23 0.0 366946390.8 0.6X Decompression 4 times from level 1 with buffer pool 933 942 9 0.0 233286880.8 1.0X Decompression 4 times from level 2 with buffer pool 1142 1171 40 0.0 285605190.0 0.8X Decompression 4 times from level 3 with buffer pool 1394 1404 13 0.0 348546518.3 0.7X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 3: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 1889 1899 14 0.0 472156817.0 1.0X Parallel Compression with 1 workers 1715 1717 2 0.0 428826617.0 1.1X Parallel Compression with 2 workers 904 906 2 0.0 225890052.0 2.1X Parallel Compression with 4 workers 539 548 8 0.0 134735732.5 3.5X Parallel Compression with 8 workers 540 548 9 0.0 134889447.5 3.5X Parallel Compression with 16 workers 577 589 23 0.0 144182540.7 3.3X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 9: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 9555 9567 18 0.0 2388642623.3 1.0X Parallel Compression with 1 workers 7973 8006 47 0.0 1993145509.0 1.2X Parallel Compression with 2 workers 5070 5071 1 0.0 1267405763.3 1.9X Parallel Compression with 4 workers 4420 4421 1 0.0 1104977620.3 2.2X Parallel Compression with 8 workers 4790 4800 15 0.0 1197417939.0 2.0X Parallel Compression with 16 workers 5000 5003 5 0.0 1249965510.5 1.9X ``` zstd-jni:1.5.7-3 ``` ================================================================================================ Benchmark ZStandardCompressionCodec ================================================================================================ OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ---------------------------------------------------------------------------------------------------------------------------------- Compression 4 times at level 1 without buffer pool 2700 2709 13 0.0 674967564.0 1.0X Compression 4 times at level 2 without buffer pool 4148 4149 0 0.0 1037124857.0 0.7X Compression 4 times at level 3 without buffer pool 5660 5682 31 0.0 1414968620.0 0.5X Compression 4 times at level 1 with buffer pool 2718 2728 14 0.0 679514554.3 1.0X Compression 4 times at level 2 with buffer pool 4130 4131 2 0.0 1032476406.2 0.7X Compression 4 times at level 3 with buffer pool 5571 5576 6 0.0 1392871057.5 0.5X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative -------------------------------------------------------------------------------------------------------------------------------------- Decompression 4 times from level 1 without buffer pool 942 951 9 0.0 235523684.5 1.0X Decompression 4 times from level 2 without buffer pool 1248 1270 31 0.0 311906360.5 0.8X Decompression 4 times from level 3 without buffer pool 1472 1475 4 0.0 368071680.5 0.6X Decompression 4 times from level 1 with buffer pool 939 956 18 0.0 234631810.0 1.0X Decompression 4 times from level 2 with buffer pool 1249 1261 16 0.0 312318610.5 0.8X Decompression 4 times from level 3 with buffer pool 1475 1475 0 0.0 368765939.3 0.6X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 3: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 1865 1873 11 0.0 466278397.5 1.0X Parallel Compression with 1 workers 1785 1793 10 0.0 446359936.8 1.0X Parallel Compression with 2 workers 945 953 10 0.0 236142005.8 2.0X Parallel Compression with 4 workers 559 577 29 0.0 139754505.5 3.3X Parallel Compression with 8 workers 537 555 13 0.0 134328778.3 3.5X Parallel Compression with 16 workers 587 614 23 0.0 146784965.5 3.2X OpenJDK 64-Bit Server VM 17.0.15+6-LTS on Linux 6.12.10-76061203-generic Intel(R) Core(TM) i5-9500 CPU 3.00GHz Parallel Compression at level 9: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Parallel Compression with 0 workers 9365 9375 14 0.0 2341247379.0 1.0X Parallel Compression with 1 workers 8022 8022 0 0.0 2005448255.8 1.2X Parallel Compression with 2 workers 5054 5069 22 0.0 1263445148.8 1.9X Parallel Compression with 4 workers 4372 4394 31 0.0 1092926980.8 2.1X Parallel Compression with 8 workers 4785 4805 28 0.0 1196282275.0 2.0X Parallel Compression with 16 workers 5012 5028 23 0.0 1252925049.5 1.9X ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50857 from pan3793/SPARK-52078. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
What changes were proposed in this pull request?
We found some unreasonable benchmark results during upgrading zstd-jni from 1.5.6-10 to 1.5.7-x in #50057, and the author suggests using real-world data for zstd compression benchmark.
Why are the changes needed?
Add a new benchmark for zstd with more reasonable data.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tested on a local machine, Ubuntu 24.04, Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz
zstd-jni:1.5.6-10
zstd-jni:1.5.7-3
Was this patch authored or co-authored using generative AI tooling?
No.