-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark #21677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @maropu |
|
Test build #92491 has finished for PR 21677 at commit
|
|
Test build #92493 has finished for PR 21677 at commit
|
|
Test build #92498 has finished for PR 21677 at commit
|
|
Test build #92504 has finished for PR 21677 at commit
|
|
@HyukjinKwon Can you merge this to master first? I would like to update the Benchmark results of several other pushdown related PRs to the corresponding PR. |
|
retest this please |
|
Test build #92633 has finished for PR 21677 at commit
|
| @@ -0,0 +1,556 @@ | |||
| ############################[ Pushdown for many distinct value case ]############################ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum, can we mimic any other format? For example, when I do such thing, I usually copy a format from another. For example, how about those below (which I am kind of used to).
Pushdown for many distinct value case:
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X
========================================================================
Pushdown for many distinct value case
========================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X
Pushdown for many distinct value case ...
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X
and double space between each "Pushdown for many distinct value case"s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
...
Select all int rows (value != -1): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 1140 / 1165 0.9 1087.4 1.0X
Parquet Vectorized (Pushdown) 1140 / 1172 0.9 1086.8 1.0X
Native ORC Vectorized 1158 / 1206 0.9 1104.7 1.0X
Native ORC Vectorized (Pushdown) 1151 / 1220 0.9 1098.1 1.0X
================================================================================================
Pushdown for few distinct value case (use dictionary encoding)
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 0 distinct string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 512 / 565 2.0 488.6 1.0X
Parquet Vectorized (Pushdown) 27 / 33 39.3 25.5 19.2X
Native ORC Vectorized 509 / 546 2.1 485.0 1.0X
Native ORC Vectorized (Pushdown) 79 / 91 13.2 75.5 6.5X
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that looks better. Let's go this way and correct it if any other committers have other preferences later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW that format was from our testing script as you might already know :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I rebuild the benchmark use this format.
| } | ||
|
|
||
| override def afterAll() { | ||
| super.afterAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
try {
out.close()
} finally {
super.afterAll()
}
| } | ||
| } | ||
|
|
||
| ignore("Pushdown benchmark for StringStartsWith") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So those four below is the newly added benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
|
Test build #92644 has finished for PR 21677 at commit
|
|
I am merging this to show up other benchmark results in @wangyum's PRs. |
|
Merged to master. |
|
Hi, @cloud-fan . As your comment #22336 (comment), I've came to this PR. To be short, this PR changed the original @cloud-fan . Could you confirm once more in order to revert this change? Also, do we want to change |
|
I need more context here. What's the benefit of the test suite style benchmark? I've committed benchmark code several times and I always use the main-method style. One benefit of the main-method style is, the benchmark probably can be more precise, without the overhead of the scalatest framework. |
|
Me, too. I always used main-method style with the same reason. And many other BMs are main-method style. According to this PR description, the reasons seemed to be
@wangyum . Do you want to add more? If there is no other reasons, I'll start to rollback this one and convert |
|
@wangyum . If you are still interested in reverting your PR as you mentioned in SPARK-25339, please comment here about your thoughts and let us know. I believe that you are the best person to revert this. |
|
Seems we manually write benchmark result to a file, which can also be done with the main-method style. |
|
Yes. @cloud-fan . We can embrace that concept to all the other main-method style benchmark. Previously, we do the manual copy&paste to put the result into the nearest place to the corresponding BM code. It's not an easy way for automation. With @wangyum 's that specific contribution, we can automate all benchmarks. Possibly, we can use that in the release process, too. So, are you heading |
Yes. So it's not reverting this PR, since writing BM result to a file is good. But we should update these BMs to use main-method style. |
|
Thank you for the confirmation, @cloud-fan. Yes, Right. It's not reverting this PR. So, @wangyum, are you available for this changes (starting from this suite)? |
What changes were proposed in this pull request?
Refer to the
WideSchemaBenchmarkupdateFilterPushdownBenchmark:benchmarks/FilterPushdownBenchmark-results.txtfor easy maintenance.StringStartsWith,Decimal,InSet -> InFiltersandtinyint.How was this patch tested?
manual tests