Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ed16a1a
[SPARK-35345][SQL] Add BloomFilter Benchmark test for Parquet
huaxingao May 8, 2021
fa02810
enable ParquetInputFormat bloom filter to true
huaxingao May 8, 2021
1bb6675
fix lint scala
huaxingao May 8, 2021
2a3f5eb
update benchmark test result
huaxingao May 8, 2021
10d7a97
set parquet block size
huaxingao May 9, 2021
c8375d6
update benchmark result'
huaxingao May 9, 2021
34d0511
set parquet.block.size for withoutBF
huaxingao May 9, 2021
21cc2ac
update results
huaxingao May 9, 2021
47f70b7
adding measurements for 16, 64 and 128M
huaxingao May 12, 2021
05b20af
update test result
huaxingao May 12, 2021
df044b3
add benchmark test for IN set with 3000 predicates
huaxingao May 27, 2021
e582c26
change the num of predicates in IN set to 300 for now
huaxingao May 27, 2021
d4d39d3
comment out parquet IN set test for now
huaxingao May 28, 2021
c0e9d97
change num of predicate to 30
huaxingao May 28, 2021
5fc105f
[SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite
cloud-fan May 28, 2021
e5ee38c
[SPARK-35535][SQL] New data source V2 API: LocalScan
gengliangwang May 27, 2021
cf8e6d6
[SPARK-35194][SQL] Refactor nested column aliasing for readability
karenfeng May 28, 2021
0a2edad
[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA
sarutak May 28, 2021
b9e4faf
[SPARK-35098][PYTHON] Re-enable pandas-on-Spark test cases
xinrong-meng May 27, 2021
89eb722
change # of predicate
huaxingao Jun 9, 2021
2d089ff
Revert "change # of predicate"
huaxingao Jun 9, 2021
0220023
Revert "[SPARK-35098][PYTHON] Re-enable pandas-on-Spark test cases"
huaxingao Jun 9, 2021
62d8518
Revert "[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2…
huaxingao Jun 9, 2021
b7176e3
Revert "[SPARK-35194][SQL] Refactor nested column aliasing for readab…
huaxingao Jun 9, 2021
9a9f0ff
Revert "[SPARK-35535][SQL] New data source V2 API: LocalScan"
huaxingao Jun 9, 2021
324fedb
Revert "[SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite"
huaxingao Jun 9, 2021
d6e320c
reduce # of predicate
huaxingao Jun 9, 2021
d481ec1
update test results
huaxingao Jun 9, 2021
63edef3
address comment
huaxingao Jun 11, 2021
82e1e8e
fix error
huaxingao Jun 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 164 additions & 8 deletions sql/core/benchmarks/BloomFilterBenchmark-jdk11-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,179 @@
ORC Write
================================================================================================

OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Write 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter 19503 19621 166 5.1 195.0 1.0X
With bloom filter 22472 22710 335 4.4 224.7 0.9X
Without bloom filter 13568 13645 109 7.4 135.7 1.0X
With bloom filter 16116 16238 172 6.2 161.2 0.8X


================================================================================================
ORC Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter 1981 2040 82 50.5 19.8 1.0X
With bloom filter 1428 1467 54 70.0 14.3 1.4X
Without bloom filter 1572 1605 47 63.6 15.7 1.0X
With bloom filter 1343 1359 23 74.5 13.4 1.2X


================================================================================================
ORC Read for IN set
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 1M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter 51 63 15 19.6 51.1 1.0X
With bloom filter 54 88 23 18.5 54.0 0.9X


================================================================================================
Parquet Write
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Write 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter 13679 13954 389 7.3 136.8 1.0X
With bloom filter 18260 18284 33 5.5 182.6 0.7X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 2097152 954 984 49 104.8 9.5 1.0X
With bloom filter, blocksize: 2097152 285 307 21 350.4 2.9 3.3X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 3145728 788 831 40 126.9 7.9 1.0X
With bloom filter, blocksize: 3145728 192 262 47 521.4 1.9 4.1X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 4194304 787 847 75 127.0 7.9 1.0X
With bloom filter, blocksize: 4194304 201 224 18 496.4 2.0 3.9X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 5242880 854 872 18 117.1 8.5 1.0X
With bloom filter, blocksize: 5242880 172 222 37 582.7 1.7 5.0X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 6291456 785 813 27 127.4 7.9 1.0X
With bloom filter, blocksize: 6291456 167 188 14 598.0 1.7 4.7X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 8388608 806 834 42 124.1 8.1 1.0X
With bloom filter, blocksize: 8388608 360 383 29 277.8 3.6 2.2X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 16777216 812 846 42 123.2 8.1 1.0X
With bloom filter, blocksize: 16777216 780 807 27 128.2 7.8 1.0X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 33554432 852 862 10 117.4 8.5 1.0X
With bloom filter, blocksize: 33554432 820 865 59 121.9 8.2 1.0X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 67108864 844 911 58 118.5 8.4 1.0X
With bloom filter, blocksize: 67108864 851 853 2 117.5 8.5 1.0X


================================================================================================
Parquet Read
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Without bloom filter, blocksize: 134217728 839 887 53 119.3 8.4 1.0X
With bloom filter, blocksize: 134217728 872 881 9 114.6 8.7 1.0X


================================================================================================
Parquet Read for IN set
================================================================================================

OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 1M rows: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Without bloom filter 70 76 6 14.2 70.2 1.0X
With bloom filter 73 103 22 13.8 72.6 1.0X
Comment on lines +177 to +178
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bloom filter is slower. It is due to IN predicate problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JDK8, bloom filter seems faster a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not due to IN predicate problem because ORC also seems a bit slower with bloom filter. I think the data is too small. Let me increase the data size and try again.



Loading