[SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins#28810
[SPARK-31705][SQL][FOLLOWUP] Avoid the unnecessary CNF computation for full-outer joins#28810maropu wants to merge 4 commits intoapache:masterfrom
Conversation
|
@maropu Thanks for the improvement. Shall we update |
|
Yea, sure. |
| case UsingJoin(_, _) => sys.error("Untransformed Using join node") | ||
|
|
||
| case jt => | ||
| sys.error(s"Unexpected join type: $jt") |
There was a problem hiding this comment.
since we are here, can we throw an exception instead? sys.error will exit the JVM IIRC
There was a problem hiding this comment.
How about IllegalStateException? Throwing an exception here looks okay to me, but I personally think we need consistent handling for unexpected code pathes. I've checed how to handle this unexpected behaivour in the other optimzier rules, then I found there are some rules to use sys.error;
$ grep -nr "sys.error" .
./Optimizer.scala:1345: sys.error(s"Unexpected join type: $jt")
./Optimizer.scala:1381: sys.error(s"Unexpected join type: $jt")
./PushCNFPredicateThroughJoin.scala:64: sys.error(s"Unexpected join type: $jt")
./subquery.scala:458: sys.error(s"Unexpected operator in scalar subquery: $lp")
./subquery.scala:496: sys.error(s"Correlated subquery has unexpected operator $op below filter")
./subquery.scala:498: case op @ _ => sys.error(s"Unexpected operator $op in correlated subquery")
./subquery.scala:502: sys.error("This line should be unreachable")
./subquery.scala:564: case op => sys.error(s"Unexpected operator $op in corelated subquery")
Is it better to change them, too?
There was a problem hiding this comment.
How about AnalysisException?
There was a problem hiding this comment.
AnalysisException is mainly used for an analyzing phase? https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala#L24
yea, I know some rules in the optimizer throw AnalysisException though...
(base) maropu@~/Repositories/spark/spark-master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer: (SPARK-31705 $)$grep -nr "throw" .
./joins.scala:187: // `requires attributes from more than one child`, we throw firstly here for better
./joins.scala:189: throw new AnalysisException("Using PythonUDF in join condition of join type" +
./joins.scala:206: throw new AnalysisException("Using PythonUDF in join condition of join type" +
./NormalizeFloatingNumbers.scala:100: throw new IllegalStateException("grouping/join/window partition keys cannot be map type.")
./NormalizeFloatingNumbers.scala:134: case _ => throw new IllegalStateException(s"fail to normalize $expr")
./objects.scala:250: throw new IllegalStateException("LambdaVariable should never has 0 as its ID.")
./objects.scala:255: throw new IllegalStateException(
./objects.scala:264: throw new IllegalStateException(
./Optimizer.scala:1345: throw new IllegalStateException(s"Unexpected join type: $jt")
./Optimizer.scala:1381: throw new IllegalStateException(s"Unexpected join type: $jt")
./Optimizer.scala:1436: throw new AnalysisException(
./PushCNFPredicateThroughJoin.scala:64: throw new IllegalStateException(s"Unexpected join type: $jt")
./ReplaceExceptWithFilter.scala:106: throw new IllegalStateException("Leaf node is expected")
./ReplaceNullWithFalseInPredicate.scala:105: throw new IllegalArgumentException(message)
./subquery.scala:79: throw new AnalysisException("Found conflicting attributes " +
|
Test build #123908 has finished for PR 28810 at commit
|
|
retest this please |
|
Test build #123917 has finished for PR 28810 at commit
|
|
Test build #123923 has finished for PR 28810 at commit
|
|
Test build #123931 has finished for PR 28810 at commit
|
| case UsingJoin(_, _) => sys.error("Untransformed Using join node") | ||
|
|
||
| case jt => | ||
| throw new IllegalStateException(s"Unexpected join type: $jt") |
There was a problem hiding this comment.
IllegalStateException means unexpected code path, I think this is the standard in Java, +1
|
retest this please |
|
ping @cloud-fan @gengliangwang |
|
Test build #124081 has finished for PR 28810 at commit
|
|
retest this please |
| case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") | ||
| case UsingJoin(_, _) => sys.error("Untransformed Using join node") | ||
|
|
||
| case jt => |
| case FullOuter => j | ||
| case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") | ||
| case UsingJoin(_, _) => sys.error("Untransformed Using join node") | ||
| case jt => |
|
Test build #124104 has finished for PR 28810 at commit
|
wangyum
left a comment
There was a problem hiding this comment.
Could we make conjunctiveNormalForm to protected in this pr?
|
Thanks, all! Pending, Jenkins. |
|
Test build #124122 has finished for PR 28810 at commit
|
|
Merged to master. |
|
Test build #124120 has finished for PR 28810 at commit
|
What changes were proposed in this pull request?
To avoid the unnecessary CNF computation for full-outer joins, this PR fixes code for filtering out full-outer joins at the entrance of the rule.
Why are the changes needed?
To mitigate optimizer overhead.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.