-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25848][SQL][TEST] Refactor CSVBenchmarks to use main method #22845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #98071 has finished for PR 22845 at commit
|
9ddb847 to
a10eb1a
Compare
|
Test build #98074 has finished for PR 22845 at commit
|
|
retest this please |
a10eb1a to
1a7ad0a
Compare
|
Test build #98115 has finished for PR 22845 at commit
|
1a7ad0a to
905b758
Compare
|
Test build #98116 has finished for PR 22845 at commit
|
905b758 to
0749e68
Compare
|
Test build #98124 has finished for PR 22845 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution, @heary-cao .
But, the value seems to be manually copied. Could you run your benchmark actually?
In general, during refactoring, you should rerun the benchmark by yourself and check whether the PR doesn't create some regression.
0749e68 to
40cadc7
Compare
|
Test build #98189 has finished for PR 22845 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update without sbt usage to:
bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update the usage in description:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar004ed13 to
6d1f1f5
Compare
|
retest this please |
|
Test build #98198 has finished for PR 22845 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#22872 has updated runBenchmarkSuite's signature.
| override def runBenchmarkSuite(): Unit = { | |
| override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @yucai 's comment.
|
Test build #98195 has finished for PR 22845 at commit
|
6d1f1f5 to
3c0eb0a
Compare
|
Test build #98241 has finished for PR 22845 at commit
|
3c0eb0a to
d4cb13d
Compare
|
Thank you for updating and rerunning the tests, @heary-cao . |
| Benchmark to measure CSV read/write performance | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Did you run this on Windows 7?
|
@dongjoon-hyun, Well, my office machine. |
|
Test build #98254 has finished for PR 22845 at commit
|
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1 | ||
| Intel64 Family 6 Model 94 Stepping 3, GenuineIntel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, GHz is missing here. So, it's hard to figure out what CPU is used here.
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz [Family 6 Model 94 Stepping 3]
Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz [Family 6 Model 94 Stepping 3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the limitation in Spark benchmark code itself (in Window environment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to you. Could you review and merge heary-cao#2 ?
7d21b81 to
22acac2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the ratio change seems to be due to the improvement on count(). cc @HyukjinKwon .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heary-cao . Could you rename the files?
CSVBenchmarks.scala->CSVBenchmark.scalaCSVBenchmarks-results.txt->CSVBenchmark-results.txt- Line 35 should be changed together from
benchmarks/CSVBenchmarks-results.txttobenchmarks/CSVBenchmark-results.txt.
22acac2 to
490a60c
Compare
|
Test build #98258 has finished for PR 22845 at commit
|
|
Test build #98259 has finished for PR 22845 at commit
|
|
Test build #98261 has finished for PR 22845 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Thank you, @heary-cao . Merged to master. |
|
thanks,@dongjoon-hyum |
## What changes were proposed in this pull request? use spark-submit: `bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar` Generate benchmark result: `SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"` ## How was this patch tested? manual tests Closes apache#22845 from heary-cao/CSVBenchmarks. Authored-by: caoxuewen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
use spark-submit:
bin/spark-submit --class org.apache.spark.sql.execution.datasources.csv.CSVBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jarGenerate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.datasources.csv.CSVBenchmark"How was this patch tested?
manual tests