Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jul 13, 2022

What changes were proposed in this pull request?

This PR updates PhysicalOperation to make it the same as ScanOperation, then remove ScanOperation and replace all its usages with PhysicalOperation. It also adds a new pattern match NodeWithOnlyDeterministicProjectAndFilter and uses it in places where we only need to extract a relation, but no need to collect projects and filters.

Why are the changes needed?

PhysicalOperation has known issues: it aggressively collapses projects and filters, which may lead to bad query plans. To fix this issue, we introduced ScanOperation and use it in a few critical places like scan strategies. To be conservative, we didn't replace PhysicalOperation with ScanOperation everywhere. However, PhysicalOperation has performance issues in itself when collapsing projects and merging expressions, if the query plan is very complicated. We should always follow the rule CollpaseProjects and stop merging expressions when it duplicates expensive expressions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing tests

@github-actions github-actions bot added the SQL label Jul 13, 2022
@cloud-fan
Copy link
Contributor Author

cc @viirya @wangyum @ulysses-you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if it returns Some for all other cases, doesn't it mean it matches all input plans?

Copy link
Contributor Author

@cloud-fan cloud-fan Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I explicitly call it out in the classdoc. This is the same in PhysicalOperation, which always returns Some, and the caller side will specify the expected type.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing ScanOperation with PhysicalOperation looks okay to me. But I have a question about NodeWithOnlyDeterministicProjectAndFilter.

@ulysses-you
Copy link
Contributor

Follow the previous PhysicalOperation semantics, shall we also update this to NodeWithOnlyDeterministicProjectAndFilter ? we may miss inject runtime filter if the project of filter side holds some complex expressions.

private def isSelectiveFilterOverScan(plan: LogicalPlan): Boolean = {
val ret = plan match {
case PhysicalOperation(_, filters, child) if child.isInstanceOf[LeafNode] =>
filters.forall(isSimpleExpression) &&
filters.exists(isLikelySelective)
case _ => false
}
!plan.isStreaming && ret

otherwise looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavior was, collecting all the filters above a leaf node and checking if all filters are simple expressions and at least one of them is selective.

I've rewritten the code to keep the old behavior but in a more efficient way: simply traverse the plan tree to check filter predicates, instead of merging expressions and collecting filters that may duplicate complicated expressions.

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants