[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method #22844

heary-cao · 2018-10-26T04:14:37Z

What changes were proposed in this pull request?

Refactor JSONBenchmark to use main method

use spark-submit:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"

How was this patch tested?

manual tests

heary-cao · 2018-10-26T04:20:18Z

cc @dongjoon-hyun, @wangyum

dongjoon-hyun · 2018-10-26T04:32:03Z

ok to test

SparkQA · 2018-10-26T04:35:48Z

Test build #98072 has finished for PR 22844 at commit 937111f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-26T07:05:02Z

Test build #98075 has finished for PR 22844 at commit 62af4fd.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

heary-cao · 2018-10-26T07:24:17Z

retest this please

SparkQA · 2018-10-27T05:30:48Z

Test build #98117 has finished for PR 22844 at commit 5c05263.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-27T07:05:02Z

Test build #98120 has finished for PR 22844 at commit ff86e34.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-27T12:15:27Z

Test build #98123 has finished for PR 22844 at commit ebef3f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Thank you for your contribution, @heary-cao .
But, the values seems to be manually copied. Could you run your benchmark actually?
In general, during refactoring, you should rerun the benchmark by yourself and check whether the PR doesn't create some regression.

cc @yucai and @wangyum

wangyum · 2018-10-29T08:27:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala

Please update without sbt usage to:

bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>

Also update the usage in description:

bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

wangyum · 2018-10-29T09:01:10Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala

Sorry @heary-cao I mean update here to:

bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>

and update PR description to:

bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar

SparkQA · 2018-10-29T10:32:17Z

Test build #98193 has finished for PR 22844 at commit 15c0893.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-29T11:28:16Z

Test build #98192 has finished for PR 22844 at commit 7d23c18.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-29T13:08:02Z

Test build #98199 has finished for PR 22844 at commit fbdbf83.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-10-29T16:52:14Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala

#22872 has updated runBenchmarkSuite's signature.

Suggested change

override def runBenchmarkSuite(): Unit = {

override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {

+1 for @yucai 's comment.

SparkQA · 2018-10-30T04:28:27Z

Test build #98242 has finished for PR 22844 at commit 2c73385.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-30T07:05:01Z

Test build #98253 has finished for PR 22844 at commit 4116879.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-30T07:47:44Z

sql/core/benchmarks/JSONBenchmarks-results.txt

+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel


With the same reason (#22845 (comment)), it's difficult to figure out the CPU.

I'll make a PR to you.

@heary-cao . Could you review and merge heary-cao#3 ?

dongjoon-hyun · 2018-10-30T08:21:01Z

sql/core/benchmarks/JSONBenchmarks-results.txt

+JSON per-line parsing:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+No encoding                                 12107 / 12246          8.3         121.1       1.0X
+UTF-8 is set                                12375 / 12475          8.1         123.8       1.0X


Hi, @HyukjinKwon . According to the ratio, it seems to be a regression on No encoding case. How do you think this change?

Wait .. this is almost 50% slower. This had to be around 8000ish.

I also run this benchmark and got the same ratio. So it's a little weird.

https://github.com/heary-cao/spark/pull/3/files#diff-7676fb48b895486092bea2fb491e6de4R18

IIRC, this benchmark was added rather we can make sure setting encoding does not affect the performance without encoding (right @MaxGekk ?). We should fix this. @cloud-fan

Let me take a quick look within few days. This is per line basic case where many users are affected.

Ah, I see. This is also because of count optimization. ratio is weird but actually it's performance improvement for both cases. shouldn't be a big deal.

Yup, it's by a8a1ac0

Before:

JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ No encoding 35786 / 36446 2.8 357.9 1.0X UTF-8 is set 57486 / 58714 1.7 574.9 0.6X

After:

JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ No encoding 11142 / 11425 9.0 111.4 1.0X UTF-8 is set 11139 / 11293 9.0 111.4 1.0X

Looks not a regression.

Thank you for the confirmation!

Update result

SparkQA · 2018-10-30T12:34:33Z

Test build #98262 has finished for PR 22844 at commit 473a3a5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-30T12:41:51Z

Test build #98263 has finished for PR 22844 at commit 422df47.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-10-30T14:53:59Z

LGTM

dongjoon-hyun

+1, LGTM.

dongjoon-hyun · 2018-10-30T16:16:07Z

Since this is JSON benchmark, could you merge this, @HyukjinKwon ? :)

HyukjinKwon · 2018-10-31T02:27:43Z

Merged to master.

dongjoon-hyun · 2018-10-31T03:02:43Z

Thank you, @heary-cao , @yucai , @wangyum , and @HyukjinKwon .

MaxGekk · 2018-11-01T12:48:23Z

sql/core/benchmarks/JSONBenchmark-results.txt

+JSON parsing of wide lines:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+No encoding                                 39789 / 40053          0.3        3978.9       1.0X
+UTF-8 is set                                39505 / 39584          0.3        3950.5       1.0X


The numbers for currently used Jackson parser should be slightly different. The PR #22920 triggers creation of Jackson parser.

I commented on the PR. Please add another benchmark cases instead of changing the existing numbers.

## What changes were proposed in this pull request? Refactor JSONBenchmark to use main method use spark-submit: `bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar` Generate benchmark result: `SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"` ## How was this patch tested? manual tests Closes apache#22844 from heary-cao/JSONBenchmarks. Lead-authored-by: caoxuewen <[email protected]> Co-authored-by: heary <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

heary-cao force-pushed the JSONBenchmarks branch from 937111f to 62af4fd Compare October 26, 2018 05:33

heary-cao force-pushed the JSONBenchmarks branch from 62af4fd to 5c05263 Compare October 27, 2018 05:23

heary-cao force-pushed the JSONBenchmarks branch from 5c05263 to ff86e34 Compare October 27, 2018 06:21

heary-cao force-pushed the JSONBenchmarks branch from ff86e34 to ebef3f7 Compare October 27, 2018 08:47

dongjoon-hyun requested changes Oct 28, 2018

View reviewed changes

heary-cao force-pushed the JSONBenchmarks branch from ebef3f7 to 7d23c18 Compare October 29, 2018 08:10

wangyum reviewed Oct 29, 2018

View reviewed changes

heary-cao force-pushed the JSONBenchmarks branch from 7d23c18 to 15c0893 Compare October 29, 2018 08:51

wangyum reviewed Oct 29, 2018

View reviewed changes

heary-cao force-pushed the JSONBenchmarks branch from 15c0893 to fbdbf83 Compare October 29, 2018 09:25

yucai reviewed Oct 29, 2018

View reviewed changes

heary-cao force-pushed the JSONBenchmarks branch from fbdbf83 to 2c73385 Compare October 30, 2018 00:55

Refactor JSONBenchmarks to use main method

4116879

heary-cao force-pushed the JSONBenchmarks branch from 2c73385 to 4116879 Compare October 30, 2018 06:15

dongjoon-hyun reviewed Oct 30, 2018

View reviewed changes

Update result

c1cde63

heary-cao added 2 commits October 30, 2018 17:15

Merge pull request #3 from dongjoon-hyun/PR-22844

473a3a5

Update result

fix file name

422df47

wangyum approved these changes Oct 30, 2018

View reviewed changes

dongjoon-hyun approved these changes Oct 30, 2018

View reviewed changes

asfgit closed this in f6ff632 Oct 31, 2018

MaxGekk reviewed Nov 1, 2018

View reviewed changes

	override def runBenchmarkSuite(): Unit = {
	override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {

[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method #22844

[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method #22844

Uh oh!

Conversation

heary-cao commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

heary-cao commented Oct 26, 2018

Uh oh!

dongjoon-hyun commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

heary-cao commented Oct 26, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

SparkQA commented Oct 27, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyum Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

HyukjinKwon commented Oct 30, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

heary-cao commented Oct 26, 2018 •

edited

Loading

wangyum Oct 29, 2018 •

edited

Loading