[SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method #22845

heary-cao · 2018-10-26T04:19:35Z

What changes were proposed in this pull request?

use spark-submit:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"

How was this patch tested?

manual tests

heary-cao · 2018-10-26T04:20:05Z

cc @dongjoon-hyun, @wangyum

dongjoon-hyun · 2018-10-26T04:30:45Z

ok to test

SparkQA · 2018-10-26T04:35:46Z

Test build #98071 has finished for PR 22845 at commit 9ddb847.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-26T07:05:02Z

Test build #98074 has finished for PR 22845 at commit a10eb1a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

heary-cao · 2018-10-26T07:23:59Z

retest this please

SparkQA · 2018-10-27T05:15:39Z

Test build #98115 has finished for PR 22845 at commit 1a7ad0a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-27T07:05:02Z

Test build #98116 has finished for PR 22845 at commit 905b758.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-27T12:19:59Z

Test build #98124 has finished for PR 22845 at commit 0749e68.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Thank you for your contribution, @heary-cao .
But, the value seems to be manually copied. Could you run your benchmark actually?
In general, during refactoring, you should rerun the benchmark by yourself and check whether the PR doesn't create some regression.

cc @yucai and @wangyum

SparkQA · 2018-10-29T08:05:32Z

Test build #98189 has finished for PR 22845 at commit 40cadc7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-10-29T08:22:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala

Please update without sbt usage to:

bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>

Also update the usage in description:

bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

heary-cao · 2018-10-29T11:03:16Z

retest this please

SparkQA · 2018-10-29T13:01:54Z

Test build #98198 has finished for PR 22845 at commit 6d1f1f5.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-10-29T16:53:40Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala

#22872 has updated runBenchmarkSuite's signature.

Suggested change

override def runBenchmarkSuite(): Unit = {

override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {

+1 for @yucai 's comment.

SparkQA · 2018-10-29T17:36:22Z

Test build #98195 has finished for PR 22845 at commit 004ed13.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-30T04:03:47Z

Test build #98241 has finished for PR 22845 at commit 3c0eb0a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-30T06:42:24Z

Thank you for updating and rerunning the tests, @heary-cao .

dongjoon-hyun · 2018-10-30T06:43:57Z

sql/core/benchmarks/CSVBenchmarks-results.txt

+Benchmark to measure CSV read/write performance
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1


Wow. Did you run this on Windows 7?

heary-cao · 2018-10-30T06:57:23Z

@dongjoon-hyun, Well, my office machine.

SparkQA · 2018-10-30T07:05:02Z

Test build #98254 has finished for PR 22845 at commit d4cb13d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-30T07:34:14Z

sql/core/benchmarks/CSVBenchmarks-results.txt

+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel


Actually, GHz is missing here. So, it's hard to figure out what CPU is used here.

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz [Family 6 Model 94 Stepping 3] Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz [Family 6 Model 94 Stepping 3] Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz [Family 6 Model 94 Stepping 3] Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz [Family 6 Model 94 Stepping 3]

This seems to be the limitation in Spark benchmark code itself (in Window environment).

I made a PR to you. Could you review and merge heary-cao#2 ?

dongjoon-hyun · 2018-10-30T08:43:02Z

sql/core/benchmarks/CSVBenchmarks-results.txt

In this case, the ratio change seems to be due to the improvement on count(). cc @HyukjinKwon .

dongjoon-hyun · 2018-10-30T08:46:42Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala

@heary-cao . Could you rename the files?

CSVBenchmarks.scala -> CSVBenchmark.scala

CSVBenchmarks-results.txt -> CSVBenchmark-results.txt

Line 35 should be changed together from benchmarks/CSVBenchmarks-results.txt to benchmarks/CSVBenchmark-results.txt.

SparkQA · 2018-10-30T11:29:57Z

Test build #98258 has finished for PR 22845 at commit 7d21b81.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-30T11:42:50Z

Test build #98259 has finished for PR 22845 at commit 22acac2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-30T12:16:57Z

Test build #98261 has finished for PR 22845 at commit 490a60c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2018-10-30T16:18:17Z

Thank you, @heary-cao . Merged to master.

heary-cao · 2018-11-01T00:56:19Z

thanks,@dongjoon-hyum

## What changes were proposed in this pull request? use spark-submit: `bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar` Generate benchmark result: `SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"` ## How was this patch tested? manual tests Closes apache#22845 from heary-cao/CSVBenchmarks. Authored-by: caoxuewen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

heary-cao force-pushed the CSVBenchmarks branch from 9ddb847 to a10eb1a Compare October 26, 2018 05:34

heary-cao force-pushed the CSVBenchmarks branch from a10eb1a to 1a7ad0a Compare October 27, 2018 05:13

heary-cao force-pushed the CSVBenchmarks branch from 1a7ad0a to 905b758 Compare October 27, 2018 05:17

heary-cao force-pushed the CSVBenchmarks branch from 905b758 to 0749e68 Compare October 27, 2018 08:51

dongjoon-hyun requested changes Oct 28, 2018

View reviewed changes

heary-cao force-pushed the CSVBenchmarks branch from 0749e68 to 40cadc7 Compare October 29, 2018 07:16

wangyum reviewed Oct 29, 2018

View reviewed changes

heary-cao force-pushed the CSVBenchmarks branch 2 times, most recently from 004ed13 to 6d1f1f5 Compare October 29, 2018 09:13

yucai reviewed Oct 29, 2018

View reviewed changes

heary-cao force-pushed the CSVBenchmarks branch from 6d1f1f5 to 3c0eb0a Compare October 30, 2018 00:53

heary-cao force-pushed the CSVBenchmarks branch from 3c0eb0a to d4cb13d Compare October 30, 2018 06:20

dongjoon-hyun reviewed Oct 30, 2018

View reviewed changes

dongjoon-hyun mentioned this pull request Oct 30, 2018

[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method #22844

Closed

heary-cao force-pushed the CSVBenchmarks branch from 7d21b81 to 22acac2 Compare October 30, 2018 08:17

dongjoon-hyun reviewed Oct 30, 2018

View reviewed changes

Refactor CSVBenchmarks to use main method

490a60c

heary-cao force-pushed the CSVBenchmarks branch from 22acac2 to 490a60c Compare October 30, 2018 09:02

dongjoon-hyun approved these changes Oct 30, 2018

View reviewed changes

asfgit closed this in 94de560 Oct 30, 2018

	override def runBenchmarkSuite(): Unit = {
	override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {

[SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method #22845

[SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method #22845

Uh oh!

Conversation

heary-cao commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

heary-cao commented Oct 26, 2018

Uh oh!

dongjoon-hyun commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

heary-cao commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heary-cao commented Oct 29, 2018

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

dongjoon-hyun commented Oct 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heary-cao commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 30, 2018

Uh oh!

heary-cao commented Nov 1, 2018

Uh oh!

Reviewers

heary-cao commented Oct 26, 2018 •

edited

Loading

dongjoon-hyun Oct 30, 2018 •

edited

Loading