Add a new rule for removing "false" filters#16206
Add a new rule for removing "false" filters#16206assaf2 wants to merge 2 commits intotrinodb:masterfrom
Conversation
9ad296e to
8bbceb3
Compare
There was a problem hiding this comment.
The reason why you're adding a check here is that such Filter wasn't removed by some other rule.
What if a null-filter eligible for removal is not directly above a Table Scan?
My understanding is that it won't get removed, but it should.
I think we need a check here, in PushPredicateIntoTableScan, but we definitely need a check elsewhere as well.
There was a problem hiding this comment.
I don't think there's a way to come up with a rule that can cover 100% of the cases because:
- AFAIU we don't want to evaluate the entire expression within the rule (source: Remove constant null filters #15558 (comment)).
- Comparing (except
IS DISTINCT FROM)nullto anything should result innull, notfalse(source: https://trino.io/docs/current/functions/comparison.html#is-distinct-from-and-is-not-distinct-from). I can modifyio.trino.sql.planner.ExpressionInterpreter.Visitor#processComparisonExpressionto returnfalseinstead ofnullwhenleft\rightisnull, but AFAIU this is not logically correct (BTW I've tried this and got lots of tests failures).
I'm not sure I agree with the comment that existed even before my change:
if? And what if the logic of PushPredicateIntoTableScan will be changed in the future? If we can't come up with a rule that can cover 100% of the filters that are eventually evaluated to false/ null, then IMHO, we should rely on this if. WDYT?
There was a problem hiding this comment.
AFAIU we don't want to evaluate the entire expression within the rule (source: #15558 (comment)).
We're not evaluating here either, right?
What prevents us from having a Rule that would e.g. apply to every FilterNode and did the logic which we're doing here?
There was a problem hiding this comment.
We're not evaluating here either, right?
Right, we're executing some logic that's specific for the pushdown and discovering false filters on the way.
What prevents us from having a Rule that would e.g. apply to every FilterNode and did the logic which we're doing here?
It feels arbitrary to me to execute some logic that we happen to have that's doing part of the job, but that must be a minority opinion since both you and the comment writer think otherwise.
I think we need a check here, in PushPredicateIntoTableScan, but we definitely need a check elsewhere as well.
So I'll leave this if as is and add another rule
There was a problem hiding this comment.
As in #16206 (comment), i think we need both
- a separate rule, which would cover for Filter-not-necessarily-right-above-a-TableScan that is redundant and can be removed
- the change here as proposed is also fine to have. Rules should avoid depending on each other as much as possible. However, we should strive so that reverting the change here in PushPredicateIntoTableScan.java would not impact what query plans we produce.
598fa26 to
233f49e
Compare
|
Rebased in order to resolve conflicts |
233f49e to
140f9b9
Compare
b9446e5 to
869b5c5
Compare
There was a problem hiding this comment.
We now have two triggering conditions for the exception.
If the exception is raised, it would be good to know which one triggered. Add constraint to the message
There was a problem hiding this comment.
with that commit being first, do all the test pass?
There was a problem hiding this comment.
The check here was for added for documentation purposes 83e144a
Fortunately, the
applyFilteris never invoked with NONE domain, and
should not be, since the engine knows what to do in such case without
asking the connector.
The || FALSE.equals(constraint.getExpression() doesn't seem to be just for documentation purposes though, since there is no equivalent logic in PushPredicateIntoTableScan.
If current PushPredicateIntoTableScan code guarantees that FALSE.equals(constraint.getExpression() will never be true, this can be as is, but requires a code comment. If we go that way, I would split the if into two independent ones (the old condition and the new condition do not deserve same type of explanation)
There was a problem hiding this comment.
with that commit being first, do all the test pass?
The issue was with NULL pushdown, not with FALSE, all tests should pass regardless its position among the other commits. Sorry if it caused confusion, do you want me to open a separate PR for this change?
I saw that we validated summary is non-NONE and thought this validation is not whole without checking that the other part of the constraint (the expression) is also non-FALSE.
If current PushPredicateIntoTableScan code guarantees that FALSE.equals(constraint.getExpression() will never be true
There's no explicit check for that, but DomainTranslator's implementation translates FALSE expressions into TupleDomain.none() (via visitBooleanLiteral) so the remaining expression shouldn't be FALSE and therefore the translated connector expression shouldn't be FALSE as well. Do you think I better add an explicit check within PushPredicateIntoTableScan?
The check here was for added for documentation purposes 83e144a
Fortunately, the applyFilter is never invoked with NONE domain, and
I was thinking that if we don't want it to be invoked with NONE domain (which is reasonable - why would we want to pushdown NONE filter into the connector?), then we probably don't want to pushdown FALSE expressions as well. I don't see a separation here between documentation and validation. If we say for documentation purposes that summary shouldn't be NONE then why not say that expression shouldn't be FALSE? I guess we don't want inconsistent behavior between the two. As mentioned earlier, I can add an explicit check to PushPredicateIntoTableScan if you think it's necessary.
We now have two triggering conditions for the exception.
If the exception is raised, it would be good to know which one triggered.
Agree, I separated it into 2 different if clauses with different exception messages.
core/trino-main/src/main/java/io/trino/sql/planner/DomainTranslator.java
Outdated
Show resolved
Hide resolved
...o-main/src/test/java/io/trino/sql/planner/iterative/rule/TestPushPredicateIntoTableScan.java
Outdated
Show resolved
Hide resolved
...n/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlConnectorTest.java
Outdated
Show resolved
Hide resolved
...no-main/src/main/java/io/trino/sql/planner/iterative/rule/RemoveEmptyMergeWriterRuleSet.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
RemoveTrivialFilters is used twice, the new rule is used once. what's the selection criteria to when to use the new rule and when not to?
There was a problem hiding this comment.
I wanted to avoid PushPredicateIntoTableScan from getting into this if clause
if (newDomain.isNone()) {
// This is just for extra safety, the rule RemoveFalseFiltersAfterDomainTranslator is responsible for eliminating such filters
I saw that we run simplifyOptimizer at line 556 and then PushPredicateIntoTableScan for the first time at line 600 so I thought it was not needed at line 438 as well. But thinking about it again, if we have a chance of getting a trivial filter at line 438, maybe it's worth removing FALSE filters at this point as well. Added.
...trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PushPredicateIntoTableScan.java
Outdated
Show resolved
Hide resolved
869b5c5 to
ae1ae9a
Compare
ce1a237 to
5f74783
Compare
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
50920fc to
1286d74
Compare
...ain/src/test/java/io/trino/sql/planner/iterative/rule/TestRemoveEmptyMergeWriterRuleSet.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/DomainTranslator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
did you mean to invoke RemoveTrivialFilters as well?
There was a problem hiding this comment.
As mentioned at #16206 (comment) I've added this line because I saw PushPredicateIntoTableScan at PlanOptimizers:600 may produce None predicates. IMO this realization is insufficient to determine that RemoveTrivialFilters invocation is also required. Do you think otherwise?
There was a problem hiding this comment.
I don't understand. Perhaps you need to explain this to me once more.
I am just curious what's the usefulness of RemoveFalseFiltersAfterDomainTranslator if not coming together with RemoveTrivialFilters. It won't remove the FilterNode.
There was a problem hiding this comment.
You are right, now that RemoveFalseFiltersAfterDomainTranslator is an ExpressionRewriteRuleSet and not a standalone rule, RemoveTrivialFilters is also required. Added.
There was a problem hiding this comment.
I think the new rule should extend
ExpressionRewriteRuleSetand thus apply to any boolean expression that can be proven to be false. This would apply to filternode, but also to projections (IF, CASE or just projecting the result of a boolean expression).
after switching to ExpressionRewriteRuleSet "RemoveFalseFiltersAfterDomainTranslator" is no longer a good rule name.
"SimplifyFalseConditions"
There was a problem hiding this comment.
Another issue - we are now translating any
CAST(null AS <type>)toFALSEalthough when not within a filter, the expression sometimes should stay null. For example, runio.trino.plugin.postgresql.TestPostgreSqlTableStatistics#testNullsFraction
correct.
is postgresql test the only one catching that?
please add something to AbstractTestEngineOnlyQueries as well
|
This PR is growing. It's awesome to see more good changes coming. Can you try moving Can you try moving |
1286d74 to
0c2ac27
Compare
0c2ac27 to
5ae6600
Compare
Done. See #17013 and #17014. |
a05aa88 to
703d528
Compare
Make Project optional, for the case MergeProjectWithValues is executed beforehand.
The rule uses DomainTranslator to infer that the expression is "false".
703d528 to
041d25c
Compare
|
Rebased onto master branch due to conflicts |
| .addAll(columnPruningRules) | ||
| .add(new InlineProjections(plannerContext, typeAnalyzer)) | ||
| .addAll(new PushFilterThroughCountAggregation(plannerContext).rules()) // must run after PredicatePushDown and after TransformFilteringSemiJoinToInnerJoin | ||
| .addAll(new SimplifyFalseConditions(plannerContext).rules()) |
There was a problem hiding this comment.
There's already a rule for this: RemoveTrivialFilters. The new logic should be implemented within that rule.
There was a problem hiding this comment.
i think it became a separate rule following #15558 (comment)
There was a problem hiding this comment.
I think it was mainly because of @martint's comment from 4 years ago:
But I don't object to merging these two, they seem to go together anyway + that will turn
RemoveTrivialFilters into an ExpressionRewriteRuleSet so maybe it will also benefit the join case (I didn't check).I'm not sure the term "trivial" will still be accurate though.
There was a problem hiding this comment.
It's ok to have a rule that simplifies expressions (we already have one) and one that removes filters where the expression is known to filter everything or nothing (we already have one, too). But we should not add yet another rule that mixes both concerns.
There was a problem hiding this comment.
I took another look, RemoveTrivialFilters can't become an ExpressionRewriteRuleSet because ExpressionRewriteRuleSet doesn't replace a node of one type with a node of another type.
The new rule's logic can't be moved into SimplifyExpressions because not all node types are supported (see #16206 (comment)).
I can make RemoveTrivialFilters return 2 rules, but I failed to prove the join case (couldn't find an actual query that has a final plan that contains a JoinNode with a filter that can be simplified to FALSE \ TRUE).
@findepi \ @martint WDYT should be the next step?
There was a problem hiding this comment.
the join case (couldn't find an actual query that has a final plan that contains a JoinNode with a filter that can be simplified to FALSE \ TRUE).
I guess that's because PPD pushes such filter under the join and then it gets simplified/eliminated there
But we should not add yet another rule that mixes both concerns.
I read it as implying RemoveTrivialFilters shouldn't call DomainTranslator for help.
We tried to introduce a generic rule ExpressionRewriteRuleSet-based to simplify boolean conditions that can be replaced with constant false. I know what problems this run into (distinguishing context: eg between projected).
Did it result in more test coverage? like cases mentioned in #16206 (comment) ?
There was a problem hiding this comment.
I read it as implying RemoveTrivialFilters shouldn't call DomainTranslator for help.
Please see #16206 (comment). Note that adding this logic into RemoveTrivialFilters will address this concern you raised #16206 (comment).
Did it result in more test coverage? like cases mentioned in #16206 (comment) ?
I didn't add new tests for those cases.
There was a problem hiding this comment.
I didn't add new tests for those cases.
Can you please do?
These are interesting test cases to have, regardless of the implementation choices we end up doing in this PR
| types); | ||
|
|
||
| if (decomposedPredicate.getTupleDomain().isNone()) { | ||
| return FALSE_LITERAL; |
There was a problem hiding this comment.
what if the expression was already a false_literal? do we need to return expression in such case (same, not just equal)?
There was a problem hiding this comment.
I don't see a reason for that, do you have something in mind?
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
👋 @assaf2 @findepi @martint - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
Description
The rule uses
DomainTranslatorto infer that the expression is "false".Additional context and related issues
Blocked by #17013 and #17014
Release notes
( X ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: