[SPARK-29975][SQL] introduce --CONFIG_DIM directive by cloud-fan · Pull Request #26612 · apache/spark

cloud-fan · 2019-11-20T17:28:19Z

What changes were proposed in this pull request?

allow the sql test files to specify different dimensions of config sets during testing. For example,

--CONFIG_DIM1 a=1
--CONFIG_DIM1 b=2,c=3

--CONFIG_DIM2 x=1
--CONFIG_DIM2 y=1,z=2

This example defines 2 config dimensions, and each dimension defines 2 config sets. We will run the queries 4 times:

a=1, x=1
a=1, y=1, z=2
b=2, c=3, x=1
b=2, c=3, y=1, z=2

Why are the changes needed?

Currently SQLQueryTestSuite takes a long time. This is because we run each test at least 3 times, to check with different codegen modes. This is not necessary for most of the tests, e.g. DESC TABLE. We should only check these codegen modes for certain tests.

With the --CONFIG_DIM directive, we can do things like: test different join operator(broadcast or shuffle join) X different codegen modes.

After reducing testing time, we should be able to run thrifter server SQL tests with config settings.

Does this PR introduce any user-facing change?

no

How was this patch tested?

test only

cloud-fan · 2019-11-20T17:29:13Z

cc @maropu @wangyum @MaxGekk @dongjoon-hyun

cloud-fan · 2019-11-20T17:29:56Z

sql/core/src/test/resources/sql-tests/inputs/join-empty-relation.sql

@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:


This is to test the optimizer, don't need to run it with different join operators

cloud-fan · 2019-11-20T17:30:53Z

sql/core/src/test/resources/sql-tests/inputs/natural-join.sql

@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:


This is to test the analyzer/optimizer. Natural join will be rewritten to other normal joins, no need to un it with different join operators

cloud-fan · 2019-11-20T17:31:31Z

sql/core/src/test/resources/sql-tests/inputs/outer-join.sql


-- Set the cross join enabled flag for the LEFT JOIN test since there's no join condition.
-- Ultimately the join should be optimized away.
-set spark.sql.crossJoin.enabled = true;


this is true by default now.

cloud-fan · 2019-11-20T17:32:19Z

sql/core/src/test/resources/sql-tests/inputs/udf/udf-join-empty-relation.sql

@@ -1,8 +1,3 @@
-- List of configuration the test suite is run against:


we are testing UDFs, and the join operator doesn't matter.

SparkQA · 2019-11-20T17:32:58Z

Test build #114175 has finished for PR 26612 at commit 3be4734.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-11-20T19:26:27Z

...erver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala

@@ -104,9 +101,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite {
    "subquery/in-subquery/simple-in.sql",
    "subquery/in-subquery/in-order-by.sql",
    "subquery/in-subquery/in-set-operations.sql",


Hi, @cloud-fan .
This causes linter failure. Please remove the ending ,.

$ dev/lint-scala Scalastyle checks failed at following occurrences: [error] /Users/dongjoon/PRS/PR-26612/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala: illegal start of simple expression: Token(RPAREN,),3495,)) [error] Total time: 20 s, completed Nov 20, 2019 11:24:55 AM

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

SparkQA · 2019-11-21T08:05:02Z

Test build #114198 has finished for PR 26612 at commit 9562cae.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-11-21T08:05:31Z

retest this please

SparkQA · 2019-11-21T11:54:29Z

Test build #114216 has finished for PR 26612 at commit 9562cae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-21T12:08:47Z

Test build #114218 has finished for PR 26612 at commit 9562cae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-11-21T14:12:43Z

After this PR:

SQLQueryTestSuite: 25min
ThriftServerQueryTestSuite: 18min

Before this PR: (use the result of #26214)

SQLQueryTestSuite: 47min
ThriftServerQueryTestSuite: 8min (but can't apply settings)

wangyum · 2019-11-21T14:40:39Z

Yes. Much faster than before.
SQLQueryTestSuite:

[info] Run completed in 15 minutes, 43 seconds.
[info] Total number of tests run: 202

ThriftServerQueryTestSuite:

[info] Run completed in 13 minutes, 13 seconds.
[info] Total number of tests run: 137

HyukjinKwon · 2019-11-22T01:56:15Z

Merged to master.

maropu · 2019-11-23T00:32:24Z

sql/core/src/test/resources/sql-tests/inputs/group-by.sql

+-- Test aggregate operator with codegen on and off.
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN


(Sorry to be late) In the one dimension case, CONFIG_DIM1 is the same with SET?

it's different. We still run this test 3 times as there are 3 config sets in this dimension. It's only the same with SET if there is only one dimension and one config set.

maropu · 2019-11-23T00:35:19Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

+      //     - config:     (String, String))
+      // We need to do cartesian product for all the config dimensions, to get a list of
+      // config sets, and run the query once for each config set.
+      val configDimLines = comments.filter(_.startsWith("--CONFIG_DIM")).map(_.substring(12))


Better to update how-to use CONFIG_DIM in

spark/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

Line 69 in 6146dc4

*

Yeah we should document this there ...

### What changes were proposed in this pull request? add document to address #26612 (comment) ### Why are the changes needed? help people understand how to use --CONFIG_DIM ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #26661 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

cloud-fan added the SQL label Nov 20, 2019

cloud-fan commented Nov 20, 2019

View reviewed changes

dongjoon-hyun reviewed Nov 20, 2019

View reviewed changes

wangyum reviewed Nov 21, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala Show resolved Hide resolved

introduce --CONFIG_DIM directive

9562cae

cloud-fan force-pushed the test branch from 3be4734 to 9562cae Compare November 21, 2019 04:13

HyukjinKwon approved these changes Nov 22, 2019

View reviewed changes

HyukjinKwon closed this in e2f056f Nov 22, 2019

maropu reviewed Nov 23, 2019

View reviewed changes

cloud-fan mentioned this pull request Nov 25, 2019

[SPARK-29975][SQL][followup] document --CONFIG_DIM #26661

Closed

		@@ -1,8 +1,3 @@
		-- List of configuration the test suite is run against:

Comments

Conversation

cloud-fan commented Nov 20, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan commented Nov 20, 2019

Uh oh!

cloud-fan Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2019

Uh oh!

dongjoon-hyun Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Nov 21, 2019

Uh oh!

maropu commented Nov 21, 2019

Uh oh!

SparkQA commented Nov 21, 2019

Uh oh!

SparkQA commented Nov 21, 2019

Uh oh!

cloud-fan commented Nov 21, 2019

Uh oh!

wangyum commented Nov 21, 2019

Uh oh!

HyukjinKwon commented Nov 22, 2019

Uh oh!

maropu Nov 23, 2019

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 25, 2019

Choose a reason for hiding this comment

Uh oh!

maropu Nov 23, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cloud-fan Nov 20, 2019 •

edited

Loading

HyukjinKwon Nov 23, 2019 •

edited

Loading