-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39764][SQL] Make PhysicalOperation the same as ScanOperation #37176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if it returns Some for all other cases, doesn't it mean it matches all input plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I explicitly call it out in the classdoc. This is the same in PhysicalOperation, which always returns Some, and the caller side will specify the expected type.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing ScanOperation with PhysicalOperation looks okay to me. But I have a question about NodeWithOnlyDeterministicProjectAndFilter.
|
Follow the previous spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala Lines 119 to 126 in 899f6c9
otherwise looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old behavior was, collecting all the filters above a leaf node and checking if all filters are simple expressions and at least one of them is selective.
I've rewritten the code to keep the old behavior but in a more efficient way: simply traverse the plan tree to check filter predicates, instead of merging expressions and collecting filters that may duplicate complicated expressions.
|
thanks for review, merging to master! |
What changes were proposed in this pull request?
This PR updates
PhysicalOperationto make it the same asScanOperation, then removeScanOperationand replace all its usages withPhysicalOperation. It also adds a new pattern matchNodeWithOnlyDeterministicProjectAndFilterand uses it in places where we only need to extract a relation, but no need to collect projects and filters.Why are the changes needed?
PhysicalOperationhas known issues: it aggressively collapses projects and filters, which may lead to bad query plans. To fix this issue, we introducedScanOperationand use it in a few critical places like scan strategies. To be conservative, we didn't replacePhysicalOperationwithScanOperationeverywhere. However,PhysicalOperationhas performance issues in itself when collapsing projects and merging expressions, if the query plan is very complicated. We should always follow the ruleCollpaseProjectsand stop merging expressions when it duplicates expensive expressions.Does this PR introduce any user-facing change?
No
How was this patch tested?
existing tests