Skip to content

Conversation

@heary-cao
Copy link
Contributor

@heary-cao heary-cao commented Oct 26, 2018

What changes were proposed in this pull request?

use spark-submit:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"

How was this patch tested?

manual tests

@heary-cao
Copy link
Contributor Author

cc @dongjoon-hyun, @wangyum

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98071 has finished for PR 22845 at commit 9ddb847.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98074 has finished for PR 22845 at commit a10eb1a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Test build #98115 has finished for PR 22845 at commit 1a7ad0a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Test build #98116 has finished for PR 22845 at commit 905b758.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2018

Test build #98124 has finished for PR 22845 at commit 0749e68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution, @heary-cao .
But, the value seems to be manually copied. Could you run your benchmark actually?
In general, during refactoring, you should rerun the benchmark by yourself and check whether the PR doesn't create some regression.

cc @yucai and @wangyum

@SparkQA
Copy link

SparkQA commented Oct 29, 2018

Test build #98189 has finished for PR 22845 at commit 40cadc7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update without sbt usage to:

bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the usage in description:

bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

@heary-cao heary-cao force-pushed the CSVBenchmarks branch 2 times, most recently from 004ed13 to 6d1f1f5 Compare October 29, 2018 09:13
@heary-cao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2018

Test build #98198 has finished for PR 22845 at commit 6d1f1f5.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#22872 has updated runBenchmarkSuite's signature.

Suggested change
override def runBenchmarkSuite(): Unit = {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @yucai 's comment.

@SparkQA
Copy link

SparkQA commented Oct 29, 2018

Test build #98195 has finished for PR 22845 at commit 004ed13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98241 has finished for PR 22845 at commit 3c0eb0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for updating and rerunning the tests, @heary-cao .

Benchmark to measure CSV read/write performance
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Did you run this on Windows 7?

@heary-cao
Copy link
Contributor Author

@dongjoon-hyun, Well, my office machine.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98254 has finished for PR 22845 at commit d4cb13d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, GHz is missing here. So, it's hard to figure out what CPU is used here.

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz [Family 6 Model 94 Stepping 3]

Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the limitation in Spark benchmark code itself (in Window environment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a PR to you. Could you review and merge heary-cao#2 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the ratio change seems to be due to the improvement on count(). cc @HyukjinKwon .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heary-cao . Could you rename the files?

  • CSVBenchmarks.scala -> CSVBenchmark.scala
  • CSVBenchmarks-results.txt -> CSVBenchmark-results.txt
  • Line 35 should be changed together from benchmarks/CSVBenchmarks-results.txt to benchmarks/CSVBenchmark-results.txt.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98258 has finished for PR 22845 at commit 7d21b81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98259 has finished for PR 22845 at commit 22acac2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98261 has finished for PR 22845 at commit 490a60c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Thank you, @heary-cao . Merged to master.

@asfgit asfgit closed this in 94de560 Oct 30, 2018
@heary-cao
Copy link
Contributor Author

thanks,@dongjoon-hyum

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

use spark-submit:
`bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar`

Generate benchmark result:
`SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"`

## How was this patch tested?

manual tests

Closes apache#22845 from heary-cao/CSVBenchmarks.

Authored-by: caoxuewen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants