[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark #21677

wangyum · 2018-06-30T00:31:28Z

What changes were proposed in this pull request?

Refer to the WideSchemaBenchmark update FilterPushdownBenchmark:

Write the result to benchmarks/FilterPushdownBenchmark-results.txt for easy maintenance.
Add more benchmark case: StringStartsWith, Decimal, InSet -> InFilters and tinyint.

How was this patch tested?

manual tests

wangyum · 2018-06-30T00:36:35Z

cc @maropu

SparkQA · 2018-06-30T04:14:27Z

Test build #92491 has finished for PR 21677 at commit ccdd21c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class FilterPushdownBenchmark extends SparkFunSuite with BenchmarkBeforeAndAfterEachTest
trait BenchmarkBeforeAndAfterEachTest extends BeforeAndAfterEachTestData

SparkQA · 2018-06-30T07:05:01Z

Test build #92493 has finished for PR 21677 at commit 616933e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-30T13:22:01Z

Test build #92498 has finished for PR 21677 at commit be5d219.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-30T23:02:30Z

Test build #92504 has finished for PR 21677 at commit ec62e13.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-07-05T02:39:16Z

@HyukjinKwon Can you merge this to master first? I would like to update the Benchmark results of several other pushdown related PRs to the corresponding PR.

HyukjinKwon · 2018-07-05T02:42:07Z

retest this please

SparkQA · 2018-07-05T06:16:31Z

Test build #92633 has finished for PR 21677 at commit ec62e13.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-05T06:31:34Z

sql/core/benchmarks/FilterPushdownBenchmark-results.txt

@@ -0,0 +1,556 @@
+############################[ Pushdown for many distinct value case ]############################


@wangyum, can we mimic any other format? For example, when I do such thing, I usually copy a format from another. For example, how about those below (which I am kind of used to).

Pushdown for many distinct value case: Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X

======================================================================== Pushdown for many distinct value case ======================================================================== Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X

Pushdown for many distinct value case ... Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7928 / 8019 2.0 504.0 1.0X

and double space between each "Pushdown for many distinct value case"s.

How about this?

... Select all int rows (value != -1): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 1140 / 1165 0.9 1087.4 1.0X Parquet Vectorized (Pushdown) 1140 / 1172 0.9 1086.8 1.0X Native ORC Vectorized 1158 / 1206 0.9 1104.7 1.0X Native ORC Vectorized (Pushdown) 1151 / 1220 0.9 1098.1 1.0X ================================================================================================ Pushdown for few distinct value case (use dictionary encoding) ================================================================================================ Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 distinct string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 512 / 565 2.0 488.6 1.0X Parquet Vectorized (Pushdown) 27 / 33 39.3 25.5 19.2X Native ORC Vectorized 509 / 546 2.1 485.0 1.0X Native ORC Vectorized (Pushdown) 79 / 91 13.2 75.5 6.5X ...

Yup, that looks better. Let's go this way and correct it if any other committers have other preferences later.

BTW that format was from our testing script as you might already know :-).

OK, I rebuild the benchmark use this format.

HyukjinKwon · 2018-07-05T06:34:30Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala

+  }
+
+  override def afterAll() {
+    super.afterAll()


nit:

try { out.close() } finally { super.afterAll() }

HyukjinKwon · 2018-07-05T06:37:19Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala

+    }
+  }
+
+  ignore("Pushdown benchmark for StringStartsWith") {


So those four below is the newly added benchmarks?

HyukjinKwon

LGTM otherwise.

SparkQA · 2018-07-05T15:12:02Z

Test build #92644 has finished for PR 21677 at commit 021f096.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-06T03:13:26Z

I am merging this to show up other benchmark results in @wangyum's PRs.

HyukjinKwon · 2018-07-06T03:13:44Z

Merged to master.

dongjoon-hyun · 2018-09-14T08:58:55Z

Hi, @cloud-fan .

As your comment #22336 (comment), I've came to this PR.

To be short, this PR changed the original FilterPushdownBenchmark to the current test-suite style one intentionally in order to follow WideSchemaBenchmark style. Now, I understand the reason why @wangyum filed SPARK-25339, but he couldn't start for it. :)

@cloud-fan . Could you confirm once more in order to revert this change? Also, do we want to change WideSchemaBenchmark together to be consistent for all benchmark suite?

cloud-fan · 2018-09-14T14:58:44Z

I need more context here. What's the benefit of the test suite style benchmark? I've committed benchmark code several times and I always use the main-method style. One benefit of the main-method style is, the benchmark probably can be more precise, without the overhead of the scalatest framework.

dongjoon-hyun · 2018-09-14T16:53:19Z

Me, too. I always used main-method style with the same reason. And many other BMs are main-method style. According to this PR description, the reasons seemed to be

There was a previous benchmark in that style; WideSchemaBenchmark
The result is recorded in the text file outside the code.

@wangyum . Do you want to add more?

If there is no other reasons, I'll start to rollback this one and convert WideSchemaBenchmark together in order to remove future confusions in dev community.

dongjoon-hyun · 2018-09-14T22:49:38Z

@wangyum . If you are still interested in reverting your PR as you mentioned in SPARK-25339, please comment here about your thoughts and let us know. I believe that you are the best person to revert this.

cloud-fan · 2018-09-16T14:52:09Z

Seems we manually write benchmark result to a file, which can also be done with the main-method style.

dongjoon-hyun · 2018-09-16T18:35:47Z

Yes. @cloud-fan . We can embrace that concept to all the other main-method style benchmark. Previously, we do the manual copy&paste to put the result into the nearest place to the corresponding BM code. It's not an easy way for automation.

With @wangyum 's that specific contribution, we can automate all benchmarks. Possibly, we can use that in the release process, too. So, are you heading main-method style with separate BM output files? For me, +1.

cloud-fan · 2018-09-17T03:00:06Z

So, are you heading main-method style with separate BM output files?

Yes. So it's not reverting this PR, since writing BM result to a file is good. But we should update these BMs to use main-method style.

dongjoon-hyun · 2018-09-17T08:07:29Z

Thank you for the confirmation, @cloud-fan. Yes, Right. It's not reverting this PR. main-method style with separate BM output will be the standard style for all Spark benchmark code eventually from now.

So, @wangyum, are you available for this changes (starting from this suite)?

Improvement FilterPushdownBenchmark

ccdd21c

Remove duplicate val mid = numRows / 2

616933e

wangyum added 2 commits June 30, 2018 17:20

Merge remote-tracking branch 'upstream/master' into SPARK-24692

b1e9581

SPARK-24638 already merged to master, update the StringStartsWith result

be5d219

Add tinyint benchmark.

ec62e13

HyukjinKwon reviewed Jul 5, 2018

View reviewed changes

HyukjinKwon approved these changes Jul 5, 2018

View reviewed changes

wangyum added 2 commits July 5, 2018 14:47

Merge remote-tracking branch 'upstream/master' into SPARK-24692

206d6af

update format

021f096

asfgit closed this in bf67f70 Jul 6, 2018

		@@ -0,0 +1,556 @@
		############################[ Pushdown for many distinct value case ]############################

[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark #21677

[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark #21677

Conversation

wangyum commented Jun 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

wangyum commented Jun 30, 2018

Uh oh!

SparkQA commented Jun 30, 2018

Uh oh!

SparkQA commented Jun 30, 2018

Uh oh!

SparkQA commented Jun 30, 2018

Uh oh!

SparkQA commented Jun 30, 2018

Uh oh!

wangyum commented Jul 5, 2018

Uh oh!

HyukjinKwon commented Jul 5, 2018

Uh oh!

SparkQA commented Jul 5, 2018

Uh oh!

HyukjinKwon Jul 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 5, 2018

Uh oh!

HyukjinKwon commented Jul 6, 2018

Uh oh!

HyukjinKwon commented Jul 6, 2018

Uh oh!

dongjoon-hyun commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Sep 14, 2018

Uh oh!

dongjoon-hyun commented Sep 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Sep 14, 2018

Uh oh!

cloud-fan commented Sep 16, 2018

Uh oh!

dongjoon-hyun commented Sep 16, 2018

Uh oh!

cloud-fan commented Sep 17, 2018

Uh oh!

dongjoon-hyun commented Sep 17, 2018

Uh oh!

Reviewers

Assignees

Labels

wangyum commented Jun 30, 2018 •

edited

Loading

HyukjinKwon Jul 5, 2018 •

edited

Loading

HyukjinKwon Jul 5, 2018 •

edited

Loading

dongjoon-hyun commented Sep 14, 2018 •

edited

Loading

dongjoon-hyun commented Sep 14, 2018 •

edited

Loading