[SPARK-29975][SQL] introduce --CONFIG_DIM directive#26612
[SPARK-29975][SQL] introduce --CONFIG_DIM directive#26612cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
| @@ -1,8 +1,3 @@ | |||
| -- List of configuration the test suite is run against: | |||
There was a problem hiding this comment.
This is to test the optimizer, don't need to run it with different join operators
| @@ -1,8 +1,3 @@ | |||
| -- List of configuration the test suite is run against: | |||
There was a problem hiding this comment.
This is to test the analyzer/optimizer. Natural join will be rewritten to other normal joins, no need to un it with different join operators
|
|
||
| -- Set the cross join enabled flag for the LEFT JOIN test since there's no join condition. | ||
| -- Ultimately the join should be optimized away. | ||
| set spark.sql.crossJoin.enabled = true; |
There was a problem hiding this comment.
this is true by default now.
| @@ -1,8 +1,3 @@ | |||
| -- List of configuration the test suite is run against: | |||
There was a problem hiding this comment.
we are testing UDFs, and the join operator doesn't matter.
|
Test build #114175 has finished for PR 26612 at commit
|
| @@ -104,9 +101,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite { | |||
| "subquery/in-subquery/simple-in.sql", | |||
| "subquery/in-subquery/in-order-by.sql", | |||
| "subquery/in-subquery/in-set-operations.sql", | |||
There was a problem hiding this comment.
Hi, @cloud-fan .
This causes linter failure. Please remove the ending ,.
$ dev/lint-scala
Scalastyle checks failed at following occurrences:
[error] /Users/dongjoon/PRS/PR-26612/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala: illegal start of simple expression: Token(RPAREN,),3495,))
[error] Total time: 20 s, completed Nov 20, 2019 11:24:55 AM
|
Test build #114198 has finished for PR 26612 at commit
|
|
retest this please |
|
Test build #114216 has finished for PR 26612 at commit
|
|
Test build #114218 has finished for PR 26612 at commit
|
|
After this PR:
Before this PR: (use the result of #26214)
|
|
Yes. Much faster than before. ThriftServerQueryTestSuite: |
|
Merged to master. |
| -- Test aggregate operator with codegen on and off. | ||
| --CONFIG_DIM1 spark.sql.codegen.wholeStage=true | ||
| --CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY | ||
| --CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN |
There was a problem hiding this comment.
(Sorry to be late) In the one dimension case, CONFIG_DIM1 is the same with SET?
There was a problem hiding this comment.
it's different. We still run this test 3 times as there are 3 config sets in this dimension. It's only the same with SET if there is only one dimension and one config set.
| // - config: (String, String)) | ||
| // We need to do cartesian product for all the config dimensions, to get a list of | ||
| // config sets, and run the query once for each config set. | ||
| val configDimLines = comments.filter(_.startsWith("--CONFIG_DIM")).map(_.substring(12)) |
There was a problem hiding this comment.
Better to update how-to use CONFIG_DIM in
There was a problem hiding this comment.
Yeah we should document this there ...
### What changes were proposed in this pull request? add document to address #26612 (comment) ### Why are the changes needed? help people understand how to use --CONFIG_DIM ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #26661 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
allow the sql test files to specify different dimensions of config sets during testing. For example,
This example defines 2 config dimensions, and each dimension defines 2 config sets. We will run the queries 4 times:
Why are the changes needed?
Currently
SQLQueryTestSuitetakes a long time. This is because we run each test at least 3 times, to check with different codegen modes. This is not necessary for most of the tests, e.g. DESC TABLE. We should only check these codegen modes for certain tests.With the --CONFIG_DIM directive, we can do things like: test different join operator(broadcast or shuffle join) X different codegen modes.
After reducing testing time, we should be able to run thrifter server SQL tests with config settings.
Does this PR introduce any user-facing change?
no
How was this patch tested?
test only