Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jun 30, 2018

What changes were proposed in this pull request?

Refer to the WideSchemaBenchmark update FilterPushdownBenchmark:

  1. Write the result to benchmarks/FilterPushdownBenchmark-results.txt for easy maintenance.
  2. Add more benchmark case: StringStartsWith, Decimal, InSet -> InFilters and tinyint.

How was this patch tested?

manual tests

@wangyum
Copy link
Member Author

wangyum commented Jun 30, 2018

cc @maropu

@SparkQA
Copy link

SparkQA commented Jun 30, 2018

Test build #92491 has finished for PR 21677 at commit ccdd21c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FilterPushdownBenchmark extends SparkFunSuite with BenchmarkBeforeAndAfterEachTest
  • trait BenchmarkBeforeAndAfterEachTest extends BeforeAndAfterEachTestData

@SparkQA
Copy link

SparkQA commented Jun 30, 2018

Test build #92493 has finished for PR 21677 at commit 616933e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2018

Test build #92498 has finished for PR 21677 at commit be5d219.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2018

Test build #92504 has finished for PR 21677 at commit ec62e13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jul 5, 2018

@HyukjinKwon Can you merge this to master first? I would like to update the Benchmark results of several other pushdown related PRs to the corresponding PR.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 5, 2018

Test build #92633 has finished for PR 21677 at commit ec62e13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -0,0 +1,556 @@
############################[ Pushdown for many distinct value case ]############################
Copy link
Member

@HyukjinKwon HyukjinKwon Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, can we mimic any other format? For example, when I do such thing, I usually copy a format from another. For example, how about those below (which I am kind of used to).

Pushdown for many distinct value case:

  Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
  Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

  Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
  ------------------------------------------------------------------------------------------------
  Parquet Vectorized                            7928 / 8019          2.0         504.0       1.0X
========================================================================
Pushdown for many distinct value case
========================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            7928 / 8019          2.0         504.0       1.0X
Pushdown for many distinct value case ...

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            7928 / 8019          2.0         504.0       1.0X

and double space between each "Pushdown for many distinct value case"s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

...
Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            1140 / 1165          0.9        1087.4       1.0X
Parquet Vectorized (Pushdown)                 1140 / 1172          0.9        1086.8       1.0X
Native ORC Vectorized                         1158 / 1206          0.9        1104.7       1.0X
Native ORC Vectorized (Pushdown)              1151 / 1220          0.9        1098.1       1.0X


================================================================================================
Pushdown for few distinct value case (use dictionary encoding)
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 0 distinct string row (value IS NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                             512 /  565          2.0         488.6       1.0X
Parquet Vectorized (Pushdown)                   27 /   33         39.3          25.5      19.2X
Native ORC Vectorized                          509 /  546          2.1         485.0       1.0X
Native ORC Vectorized (Pushdown)                79 /   91         13.2          75.5       6.5X
...

Copy link
Member

@HyukjinKwon HyukjinKwon Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that looks better. Let's go this way and correct it if any other committers have other preferences later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW that format was from our testing script as you might already know :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I rebuild the benchmark use this format.

}

override def afterAll() {
super.afterAll()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

try {
  out.close()
} finally {
  super.afterAll()
}

}
}

ignore("Pushdown benchmark for StringStartsWith") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So those four below is the newly added benchmarks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

@SparkQA
Copy link

SparkQA commented Jul 5, 2018

Test build #92644 has finished for PR 21677 at commit 021f096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I am merging this to show up other benchmark results in @wangyum's PRs.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in bf67f70 Jul 6, 2018
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 14, 2018

Hi, @cloud-fan .

As your comment #22336 (comment), I've came to this PR.

To be short, this PR changed the original FilterPushdownBenchmark to the current test-suite style one intentionally in order to follow WideSchemaBenchmark style. Now, I understand the reason why @wangyum filed SPARK-25339, but he couldn't start for it. :)

@cloud-fan . Could you confirm once more in order to revert this change? Also, do we want to change WideSchemaBenchmark together to be consistent for all benchmark suite?

@cloud-fan
Copy link
Contributor

I need more context here. What's the benefit of the test suite style benchmark? I've committed benchmark code several times and I always use the main-method style. One benefit of the main-method style is, the benchmark probably can be more precise, without the overhead of the scalatest framework.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 14, 2018

Me, too. I always used main-method style with the same reason. And many other BMs are main-method style. According to this PR description, the reasons seemed to be

  1. There was a previous benchmark in that style; WideSchemaBenchmark
  2. The result is recorded in the text file outside the code.

@wangyum . Do you want to add more?

If there is no other reasons, I'll start to rollback this one and convert WideSchemaBenchmark together in order to remove future confusions in dev community.

@dongjoon-hyun
Copy link
Member

@wangyum . If you are still interested in reverting your PR as you mentioned in SPARK-25339, please comment here about your thoughts and let us know. I believe that you are the best person to revert this.

@cloud-fan
Copy link
Contributor

Seems we manually write benchmark result to a file, which can also be done with the main-method style.

@dongjoon-hyun
Copy link
Member

Yes. @cloud-fan . We can embrace that concept to all the other main-method style benchmark. Previously, we do the manual copy&paste to put the result into the nearest place to the corresponding BM code. It's not an easy way for automation.

With @wangyum 's that specific contribution, we can automate all benchmarks. Possibly, we can use that in the release process, too. So, are you heading main-method style with separate BM output files? For me, +1.

@cloud-fan
Copy link
Contributor

So, are you heading main-method style with separate BM output files?

Yes. So it's not reverting this PR, since writing BM result to a file is good. But we should update these BMs to use main-method style.

@dongjoon-hyun
Copy link
Member

Thank you for the confirmation, @cloud-fan. Yes, Right. It's not reverting this PR. main-method style with separate BM output will be the standard style for all Spark benchmark code eventually from now.

So, @wangyum, are you available for this changes (starting from this suite)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants