-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35455][SQL] Unify empty relation optimization between normal and AQE optimizer #32602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
4514d27
5097247
8df68d9
165077b
2f5fa20
7b80db0
05e074c
48dd92a
8f4dc80
1220087
e26df96
f7a14cf
2c0dfb0
0e151d9
c086f72
9b78ac0
767dd92
47e0c3a
d9ca6da
2fead86
0754936
a6213ea
624e45e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -236,7 +236,9 @@ class AdaptiveQueryExecSuite | |
| test("Empty stage coalesced to 1-partition RDD") { | ||
| withSQLConf( | ||
| SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", | ||
| SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true") { | ||
| SQLConf.COALESCE_PARTITIONS_ENABLED.key -> "true", | ||
| SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES.key -> | ||
| EliminateUnnecessaryJoin.getClass.getName.stripSuffix("$")) { | ||
| val df1 = spark.range(10).withColumn("a", 'id) | ||
| val df2 = spark.range(10).withColumn("b", 'id) | ||
| withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { | ||
|
|
@@ -1307,6 +1309,74 @@ class AdaptiveQueryExecSuite | |
| } | ||
| } | ||
|
|
||
| test("SPARK-35455: Enhance EliminateUnnecessaryJoin - single join") { | ||
|
||
| withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", | ||
| SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { | ||
| Seq( | ||
| // left semi join and empty left side | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't optimize this before this PR?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, we cann't. Before we only check right side with And the test should use different column to do filter and join in case of |
||
| ("SELECT * FROM (SELECT * FROM testData WHERE key = 0)t1 LEFT SEMI JOIN testData2 t2 ON " + | ||
| "t1.key = t2.a", true), | ||
| // left anti join and empty left side | ||
| ("SELECT * FROM (SELECT * FROM testData WHERE key = 0)t1 LEFT ANTI JOIN testData2 t2 ON " + | ||
| "t1.key = t2.a", true), | ||
| // left outer join and empty left side | ||
| ("SELECT * FROM (SELECT * FROM testData WHERE key = 0)t1 LEFT JOIN testData2 t2 ON " + | ||
| "t1.key = t2.a", true), | ||
| // left outer join and non-empty left side | ||
| ("SELECT * FROM testData t1 LEFT JOIN testData2 t2 ON " + | ||
| "t1.key = t2.a", false), | ||
| // right outer join and empty right side | ||
| ("SELECT * FROM testData t1 RIGHT JOIN (SELECT * FROM testData2 WHERE b = 0)t2 ON " + | ||
| "t1.key = t2.a", true), | ||
| // right outer join and non-empty right side | ||
| ("SELECT * FROM testData t1 RIGHT JOIN testData2 t2 ON " + | ||
| "t1.key = t2.a", false), | ||
| // full outer join and both side empty | ||
| ("SELECT * FROM (SELECT * FROM testData WHERE key = 0)t1 FULL JOIN " + | ||
| "(SELECT * FROM testData2 WHERE b = 0)t2 ON t1.key = t2.a", true), | ||
| // full outer join and left side empty right side non-empty | ||
| ("SELECT * FROM (SELECT * FROM testData WHERE key = 0)t1 FULL JOIN " + | ||
| "testData2 t2 ON t1.key = t2.a", false) | ||
| ).foreach { case (query, isEliminated) => | ||
| val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(query) | ||
| assert(findTopLevelBaseJoin(plan).size == 1) | ||
| assert(findTopLevelBaseJoin(adaptivePlan).isEmpty == isEliminated) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| test("SPARK-35455: Enhance EliminateUnnecessaryJoin - multi join") { | ||
|
||
| withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", | ||
| SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { | ||
| Seq( | ||
| (""" | ||
| |SELECT * FROM testData t1 | ||
| | JOIN (SELECT * FROM testData2 WHERE b = 0) t2 ON t1.key = t2.a | ||
| | LEFT JOIN testData2 t3 ON t1.key = t3.a | ||
| |""".stripMargin, 0), | ||
| (""" | ||
| |SELECT * FROM (SELECT * FROM testData WHERE key = 0) t1 | ||
| | LEFT ANTI JOIN testData2 t2 | ||
| | FULL JOIN (SELECT * FROM testData2 WHERE b = 0) t3 ON t1.key = t3.a | ||
| |""".stripMargin, 0), | ||
| (""" | ||
| |SELECT * FROM testData t1 | ||
| | LEFT SEMI JOIN (SELECT * FROM testData2 WHERE b = 0) | ||
| | RIGHT JOIN testData2 | ||
| |""".stripMargin, 1), | ||
| (""" | ||
| |SELECT * FROM testData t1 | ||
| | FULL JOIN (SELECT * FROM testData2 WHERE b = 0) t1 | ||
| | FULL JOIN (SELECT * FROM testData WHERE key = 0) t2 | ||
| |""".stripMargin, 2) | ||
| ).foreach { case (query, joinNum) => | ||
| val (plan, adaptivePlan) = runAdaptiveAndVerifyResult(query) | ||
| assert(findTopLevelBaseJoin(plan).size == 2) | ||
| assert(findTopLevelBaseJoin(adaptivePlan).size == joinNum) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| test("SPARK-32753: Only copy tags to node with no tags") { | ||
| withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") { | ||
| withTempView("v1") { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to handle
LocalRelation, then it's not AQE specific and we can do it in the normal optimizer?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but currently we have no chance to do normal optimizer at AQE side. Maybe we can let some rules which in
Optimizeralso available atAQEOptimizerin future ?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since AQE is on by default, it's not a big issue but more about code cleanness. How about this:
EliminateUnnecessaryJoinshould only deal withLocalRelation, and appears in both the normal optimizer and AQE optimizerLocalRelationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, and just find a exists rule
PropagateEmptyRelation.How about this updating ?
ConvertToLocalRelationto turn empty query stage into empty LocalRelationPropagateEmptyRelationappears in AQE optimizerEliminateUnnecessaryJointo only handle naaj/semi/anti joins