-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method #22844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #98072 has finished for PR 22844 at commit
|
937111f to
62af4fd
Compare
|
Test build #98075 has finished for PR 22844 at commit
|
|
retest this please |
62af4fd to
5c05263
Compare
|
Test build #98117 has finished for PR 22844 at commit
|
5c05263 to
ff86e34
Compare
|
Test build #98120 has finished for PR 22844 at commit
|
ff86e34 to
ebef3f7
Compare
|
Test build #98123 has finished for PR 22844 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution, @heary-cao .
But, the values seems to be manually copied. Could you run your benchmark actually?
In general, during refactoring, you should rerun the benchmark by yourself and check whether the PR doesn't create some regression.
ebef3f7 to
7d23c18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update without sbt usage to:
bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update the usage in description:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar7d23c18 to
15c0893
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @heary-cao I mean update here to:
bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>
and update PR description to:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
15c0893 to
fbdbf83
Compare
|
Test build #98193 has finished for PR 22844 at commit
|
|
Test build #98192 has finished for PR 22844 at commit
|
|
Test build #98199 has finished for PR 22844 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#22872 has updated runBenchmarkSuite's signature.
| override def runBenchmarkSuite(): Unit = { | |
| override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @yucai 's comment.
fbdbf83 to
2c73385
Compare
|
Test build #98242 has finished for PR 22844 at commit
|
2c73385 to
4116879
Compare
|
Test build #98253 has finished for PR 22844 at commit
|
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1 | ||
| Intel64 Family 6 Model 94 Stepping 3, GenuineIntel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the same reason (#22845 (comment)), it's difficult to figure out the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a PR to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heary-cao . Could you review and merge heary-cao#3 ?
| JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------ | ||
| No encoding 12107 / 12246 8.3 121.1 1.0X | ||
| UTF-8 is set 12375 / 12475 8.1 123.8 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @HyukjinKwon . According to the ratio, it seems to be a regression on No encoding case. How do you think this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait .. this is almost 50% slower. This had to be around 8000ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also run this benchmark and got the same ratio. So it's a little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this benchmark was added rather we can make sure setting encoding does not affect the performance without encoding (right @MaxGekk ?). We should fix this. @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a quick look within few days. This is per line basic case where many users are affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. This is also because of count optimization. ratio is weird but actually it's performance improvement for both cases. shouldn't be a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it's by a8a1ac0
Before:
JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
No encoding 35786 / 36446 2.8 357.9 1.0X
UTF-8 is set 57486 / 58714 1.7 574.9 0.6X
After:
JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
No encoding 11142 / 11425 9.0 111.4 1.0X
UTF-8 is set 11139 / 11293 9.0 111.4 1.0X
Looks not a regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the confirmation!
Update result
|
Test build #98262 has finished for PR 22844 at commit
|
|
Test build #98263 has finished for PR 22844 at commit
|
|
LGTM |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Since this is JSON benchmark, could you merge this, @HyukjinKwon ? :) |
|
Merged to master. |
|
Thank you, @heary-cao , @yucai , @wangyum , and @HyukjinKwon . |
| JSON parsing of wide lines: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------ | ||
| No encoding 39789 / 40053 0.3 3978.9 1.0X | ||
| UTF-8 is set 39505 / 39584 0.3 3950.5 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers for currently used Jackson parser should be slightly different. The PR #22920 triggers creation of Jackson parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on the PR. Please add another benchmark cases instead of changing the existing numbers.
## What changes were proposed in this pull request? Refactor JSONBenchmark to use main method use spark-submit: `bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar` Generate benchmark result: `SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"` ## How was this patch tested? manual tests Closes apache#22844 from heary-cao/JSONBenchmarks. Lead-authored-by: caoxuewen <[email protected]> Co-authored-by: heary <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
What changes were proposed in this pull request?
Refactor JSONBenchmark to use main method
use spark-submit:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jarGenerate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.json.JSONBenchmark"How was this patch tested?
manual tests