-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark #32473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
ed16a1a
[SPARK-35345][SQL] Add BloomFilter Benchmark test for Parquet
huaxingao fa02810
enable ParquetInputFormat bloom filter to true
huaxingao 1bb6675
fix lint scala
huaxingao 2a3f5eb
update benchmark test result
huaxingao 10d7a97
set parquet block size
huaxingao c8375d6
update benchmark result'
huaxingao 34d0511
set parquet.block.size for withoutBF
huaxingao 21cc2ac
update results
huaxingao 47f70b7
adding measurements for 16, 64 and 128M
huaxingao 05b20af
update test result
huaxingao df044b3
add benchmark test for IN set with 3000 predicates
huaxingao e582c26
change the num of predicates in IN set to 300 for now
huaxingao d4d39d3
comment out parquet IN set test for now
huaxingao c0e9d97
change num of predicate to 30
huaxingao 5fc105f
[SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite
cloud-fan e5ee38c
[SPARK-35535][SQL] New data source V2 API: LocalScan
gengliangwang cf8e6d6
[SPARK-35194][SQL] Refactor nested column aliasing for readability
karenfeng 0a2edad
[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA
sarutak b9e4faf
[SPARK-35098][PYTHON] Re-enable pandas-on-Spark test cases
xinrong-meng 89eb722
change # of predicate
huaxingao 2d089ff
Revert "change # of predicate"
huaxingao 0220023
Revert "[SPARK-35098][PYTHON] Re-enable pandas-on-Spark test cases"
huaxingao 62d8518
Revert "[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2…
huaxingao b7176e3
Revert "[SPARK-35194][SQL] Refactor nested column aliasing for readab…
huaxingao 9a9f0ff
Revert "[SPARK-35535][SQL] New data source V2 API: LocalScan"
huaxingao 324fedb
Revert "[SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite"
huaxingao d6e320c
reduce # of predicate
huaxingao d481ec1
update test results
huaxingao 63edef3
address comment
huaxingao 82e1e8e
fix error
huaxingao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bloom filter is slower. It is due to IN predicate problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JDK8, bloom filter seems faster a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not due to IN predicate problem because ORC also seems a bit slower with bloom filter. I think the data is too small. Let me increase the data size and try again.