-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Apply coercions when creating FilterNode #15744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply coercions when creating FilterNode #15744
Conversation
d53d8a3 to
605703b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something more generic here? Like checking if predicate is of type Literal but not a BooleanLiteral?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make it more complicated, yes.
However we cannot analyze expression type fully, as we don't have symbol types here, so we fail short of being really generic. I draw the line here, we can draw the line somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add the check for NullLiteral. We don't do this in other PlanNodes which require boolean expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can all agree that FilterNode.predicate should be of boolean type.
Yet, the code base required updates in quite a few places to bring reality in line with the design.
The only known to me way to effectively find those places was to add the check here.
Now, we can hope that codebase is now correct and we can remove the check, but how will we ensure the problems do not come back with some new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that the PlanNode constructor seems to be the best place for the check. If we do the check immediately after the initial planning, we might miss wrong filters created in the Optimizer. If we wait until execution, we might miss / break optimizations.
This check is not satisfying though, as it only detects NullLiteral. If you want to keep it, then maybe add a TODO explaining what's missing. That TODO will be easy to address in the future, when we have the new IR with types included.
Also, consider adding a similar check in other PlanNodes, at least those covered by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not satisfying though, as it only detects
NullLiteral.
Correct, the check is a required check, but doesn't guarantee well-formedness.
consider adding a similar check in other PlanNodes
I would prefer to defer that, if i were to choose
If you want to keep it, then maybe add a TODO explaining what's missing.
Good idea, will add a comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the connection of this PR and #15558, can you also remove the handling of this case from RemoveTrivialFilters and then whoever gets the conflict will solve it, or do you have other plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also remove the handling of this case from RemoveTrivialFilters
Before my commit, there are cases where RemoveTrivialFilters can remove null filter -- it's when it's written as NULL in SQL and coercions are not applied. So changing RemoveTrivialFilters within this PR would be removing existing optimization. However, applying the coercions also removes the existing optimization, as the optimization no longer kicks in (becomes dead code)
Huh, I think your change needs to go first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't add the check in FilterNode constructor, like I suggested, we could retain this test case, and let it fail because of checkArgument in RemoveTrivialFilters. However, I think that the Optimizer rule should not validate the node, but instead just ignore NullLiteral.
I think that for now we can skip the validation altogether. That would be consistent with how we handle other sites where boolean type is required. If we want to validate the expression types, we should introduce a new mechanism that would have access to full type information (including symbols), and apply it to each site, that is: filter, joins, aggregation filter, merge etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think that the Optimizer rule should not validate the node, but instead just ignore
NullLiteral.
Yes, a rule doesn't need to validate a node, but it also is free to fail when a node doesn't adhere to a design. For example, a ProjectNode cannot have multiple sources, so a rule can simply fail if one does.
I think that for now we can skip the validation altogether. That would be consistent with how we handle other sites where boolean type is required
answered at the other comment: 020b366#r1085170555
605703b to
1afd5e3
Compare
|
CI #15809 |
1afd5e3 to
da8ef8d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
da8ef8d to
5389bc0
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This is sometimes useful, so let's always do it.
4914b03 to
020b366
Compare
039b2fa to
09c97f2
Compare
|
Thank you for your review! AC, PTAL |
09c97f2 to
d7d7e95
Compare
This is for code consistency reasons. Most of the places where `ExpressionMatcher` is constructed already goes through `PlanMatchPattern.expression` factory method.
`UnwrapCastInComparison` works on any nested comparison expressions.
d7d7e95 to
76c3905
Compare
|
Thank you for your review! AC, PTAL |
76c3905 to
f59e623
Compare
|
CI #15793 |
`StatementAnalyzer.Visitor#analyzeRowFilter` captures necessary coercions for filters that are not of boolean type (e.g. `null` literal being of `unknown` type). Before the change, the coercions where not getting applied though.
When predicate is not of boolean type (e.g. `null` literal being of `unknown` type), the `Analysis` holds a coercion that should be applied to it. Before the change, the coercion was not getting applied though. Co-authored-by: Assaf Bern <[email protected]>
f59e623 to
402fc7b
Compare
|
Is there a potential perf impact? Does it need benchmarks? |
@sopel39 good question, thanks for thinking about this! There is some possibility that certain filters now become pruned out where they didn't before. |
I was thinking that maybe execution now could take longer? |
|
you mean that the generated code is different? this i don't know, unfortunately. |
|
@findepi would you like to run benchmarks? |
When predicate is not of boolean type (e.g.
nullliteral being ofunknowntype), theAnalysisholds a coercion that should be appliedto it. Before the change, the coercion was not getting applied though.