Skip to content

Add a new rule for removing "false" filters#16206

Closed
assaf2 wants to merge 2 commits intotrinodb:masterfrom
assaf2:avoid-null-predicate-pushdown
Closed

Add a new rule for removing "false" filters#16206
assaf2 wants to merge 2 commits intotrinodb:masterfrom
assaf2:avoid-null-predicate-pushdown

Conversation

@assaf2
Copy link
Copy Markdown
Member

@assaf2 assaf2 commented Feb 21, 2023

Description

The rule uses DomainTranslator to infer that the expression is "false".

Additional context and related issues

Blocked by #17013 and #17014

Release notes

( X ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Feb 21, 2023
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 9ad296e to 8bbceb3 Compare February 21, 2023 17:45
@assaf2 assaf2 requested review from findepi and martint February 22, 2023 07:57
@assaf2 assaf2 marked this pull request as ready for review February 22, 2023 07:57
@assaf2 assaf2 self-assigned this Feb 26, 2023
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why you're adding a check here is that such Filter wasn't removed by some other rule.
What if a null-filter eligible for removal is not directly above a Table Scan?
My understanding is that it won't get removed, but it should.

I think we need a check here, in PushPredicateIntoTableScan, but we definitely need a check elsewhere as well.

Copy link
Copy Markdown
Member Author

@assaf2 assaf2 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a way to come up with a rule that can cover 100% of the cases because:

  1. AFAIU we don't want to evaluate the entire expression within the rule (source: Remove constant null filters #15558 (comment)).
  2. Comparing (except IS DISTINCT FROM) null to anything should result in null, not false (source: https://trino.io/docs/current/functions/comparison.html#is-distinct-from-and-is-not-distinct-from). I can modify io.trino.sql.planner.ExpressionInterpreter.Visitor#processComparisonExpression to return false instead of null when left \ right is null, but AFAIU this is not logically correct (BTW I've tried this and got lots of tests failures).

I'm not sure I agree with the comment that existed even before my change:

// TODO: DomainTranslator.fromPredicate can infer that the expression is "false" in some cases (TupleDomain.none()).
// This should move to another rule that simplifies the filter using that logic and then rely on RemoveTrivialFilters
// to turn the subtree into a Values node
Doing the same logic twice just to spare this if? And what if the logic of PushPredicateIntoTableScan will be changed in the future? If we can't come up with a rule that can cover 100% of the filters that are eventually evaluated to false/ null, then IMHO, we should rely on this if. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU we don't want to evaluate the entire expression within the rule (source: #15558 (comment)).

We're not evaluating here either, right?

What prevents us from having a Rule that would e.g. apply to every FilterNode and did the logic which we're doing here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not evaluating here either, right?

Right, we're executing some logic that's specific for the pushdown and discovering false filters on the way.

What prevents us from having a Rule that would e.g. apply to every FilterNode and did the logic which we're doing here?

It feels arbitrary to me to execute some logic that we happen to have that's doing part of the job, but that must be a minority opinion since both you and the comment writer think otherwise.

I think we need a check here, in PushPredicateIntoTableScan, but we definitely need a check elsewhere as well.

So I'll leave this if as is and add another rule

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in #16206 (comment), i think we need both

  • a separate rule, which would cover for Filter-not-necessarily-right-above-a-TableScan that is redundant and can be removed
  • the change here as proposed is also fine to have. Rules should avoid depending on each other as much as possible. However, we should strive so that reverting the change here in PushPredicateIntoTableScan.java would not impact what query plans we produce.

@assaf2
Copy link
Copy Markdown
Member Author

assaf2 commented Mar 23, 2023

Rebased in order to resolve conflicts

@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 233f49e to 140f9b9 Compare March 23, 2023 19:32
@assaf2 assaf2 marked this pull request as draft March 23, 2023 19:34
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch 5 times, most recently from b9446e5 to 869b5c5 Compare March 26, 2023 17:36
@assaf2 assaf2 marked this pull request as ready for review March 26, 2023 17:36
@assaf2
Copy link
Copy Markdown
Member Author

assaf2 commented Mar 26, 2023

@findepi @martint PTAL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have two triggering conditions for the exception.
If the exception is raised, it would be good to know which one triggered. Add constraint to the message

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that commit being first, do all the test pass?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check here was for added for documentation purposes 83e144a

Fortunately, the applyFilter is never invoked with NONE domain, and
should not be, since the engine knows what to do in such case without
asking the connector.

The || FALSE.equals(constraint.getExpression() doesn't seem to be just for documentation purposes though, since there is no equivalent logic in PushPredicateIntoTableScan.

If current PushPredicateIntoTableScan code guarantees that FALSE.equals(constraint.getExpression() will never be true, this can be as is, but requires a code comment. If we go that way, I would split the if into two independent ones (the old condition and the new condition do not deserve same type of explanation)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that commit being first, do all the test pass?

The issue was with NULL pushdown, not with FALSE, all tests should pass regardless its position among the other commits. Sorry if it caused confusion, do you want me to open a separate PR for this change?
I saw that we validated summary is non-NONE and thought this validation is not whole without checking that the other part of the constraint (the expression) is also non-FALSE.

If current PushPredicateIntoTableScan code guarantees that FALSE.equals(constraint.getExpression() will never be true

There's no explicit check for that, but DomainTranslator's implementation translates FALSE expressions into TupleDomain.none() (via visitBooleanLiteral) so the remaining expression shouldn't be FALSE and therefore the translated connector expression shouldn't be FALSE as well. Do you think I better add an explicit check within PushPredicateIntoTableScan?

The check here was for added for documentation purposes 83e144a

Fortunately, the applyFilter is never invoked with NONE domain, and

I was thinking that if we don't want it to be invoked with NONE domain (which is reasonable - why would we want to pushdown NONE filter into the connector?), then we probably don't want to pushdown FALSE expressions as well. I don't see a separation here between documentation and validation. If we say for documentation purposes that summary shouldn't be NONE then why not say that expression shouldn't be FALSE? I guess we don't want inconsistent behavior between the two. As mentioned earlier, I can add an explicit check to PushPredicateIntoTableScan if you think it's necessary.

We now have two triggering conditions for the exception.
If the exception is raised, it would be good to know which one triggered.

Agree, I separated it into 2 different if clauses with different exception messages.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveTrivialFilters is used twice, the new rule is used once. what's the selection criteria to when to use the new rule and when not to?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid PushPredicateIntoTableScan from getting into this if clause

if (newDomain.isNone()) {
            // This is just for extra safety, the rule RemoveFalseFiltersAfterDomainTranslator is responsible for eliminating such filters

I saw that we run simplifyOptimizer at line 556 and then PushPredicateIntoTableScan for the first time at line 600 so I thought it was not needed at line 438 as well. But thinking about it again, if we have a chance of getting a trivial filter at line 438, maybe it's worth removing FALSE filters at this point as well. Added.

@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 869b5c5 to ae1ae9a Compare March 28, 2023 15:35
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 28, 2023
@assaf2 assaf2 marked this pull request as draft March 29, 2023 06:23
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch 3 times, most recently from ce1a237 to 5f74783 Compare March 31, 2023 06:47
@findepi findepi marked this pull request as ready for review March 31, 2023 09:18
@assaf2 assaf2 requested a review from findepi April 2, 2023 13:41
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch 2 times, most recently from 50920fc to 1286d74 Compare April 2, 2023 14:08
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to invoke RemoveTrivialFilters as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned at #16206 (comment) I've added this line because I saw PushPredicateIntoTableScan at PlanOptimizers:600 may produce None predicates. IMO this realization is insufficient to determine that RemoveTrivialFilters invocation is also required. Do you think otherwise?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Perhaps you need to explain this to me once more.
I am just curious what's the usefulness of RemoveFalseFiltersAfterDomainTranslator if not coming together with RemoveTrivialFilters. It won't remove the FilterNode.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, now that RemoveFalseFiltersAfterDomainTranslator is an ExpressionRewriteRuleSet and not a standalone rule, RemoveTrivialFilters is also required. Added.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new rule should extend ExpressionRewriteRuleSet and thus apply to any boolean expression that can be proven to be false. This would apply to filternode, but also to projections (IF, CASE or just projecting the result of a boolean expression).

after switching to ExpressionRewriteRuleSet "RemoveFalseFiltersAfterDomainTranslator" is no longer a good rule name.

"SimplifyFalseConditions"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue - we are now translating any CAST(null AS <type>) to FALSE although when not within a filter, the expression sometimes should stay null. For example, run io.trino.plugin.postgresql.TestPostgreSqlTableStatistics#testNullsFraction

correct.

is postgresql test the only one catching that?
please add something to AbstractTestEngineOnlyQueries as well

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 12, 2023

This PR is growing. It's awesome to see more good changes coming.
At the same time, it's becoming harder to re-review.

Can you try moving Handle cast NULL in DomainTranslator to an own PR (may be with the first 3 commits)?

Can you try moving Add error handling to a coercion to a separate PR? (perhaps calling it "Add error handling for saturated floor cast")

@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 1286d74 to 0c2ac27 Compare April 13, 2023 09:22
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 0c2ac27 to 5ae6600 Compare April 13, 2023 09:47
@assaf2
Copy link
Copy Markdown
Member Author

assaf2 commented Apr 13, 2023

This PR is growing. It's awesome to see more good changes coming. At the same time, it's becoming harder to re-review.

Can you try moving Handle cast NULL in DomainTranslator to an own PR (may be with the first 3 commits)?

Can you try moving Add error handling to a coercion to a separate PR? (perhaps calling it "Add error handling for saturated floor cast")

Done. See #17013 and #17014.
Please note that this PR is now blocked by #17014.

@assaf2 assaf2 changed the title Avoid null predicate pushdown Add a new rule for removing "false" filters Apr 13, 2023
@assaf2 assaf2 requested a review from findepi April 13, 2023 09:52
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 13, 2023

Done. See #17013 and #17014.

thanks, reviewing these.

@assaf2
Copy link
Copy Markdown
Member Author

assaf2 commented Apr 16, 2023

Done. See #17013 and #17014.

thanks, reviewing these.

Thank you, I decided to move few more commits into 2 more PRs- #17047 and #17048

@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from a05aa88 to 703d528 Compare April 16, 2023 15:32
assaf2 added 2 commits April 17, 2023 16:31
Make Project optional, for the case MergeProjectWithValues
is executed beforehand.
The rule uses DomainTranslator to infer that the expression is "false".
@assaf2 assaf2 force-pushed the avoid-null-predicate-pushdown branch from 703d528 to 041d25c Compare April 17, 2023 13:32
@assaf2
Copy link
Copy Markdown
Member Author

assaf2 commented Apr 17, 2023

Rebased onto master branch due to conflicts

.addAll(columnPruningRules)
.add(new InlineProjections(plannerContext, typeAnalyzer))
.addAll(new PushFilterThroughCountAggregation(plannerContext).rules()) // must run after PredicatePushDown and after TransformFilteringSemiJoinToInnerJoin
.addAll(new SimplifyFalseConditions(plannerContext).rules())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a rule for this: RemoveTrivialFilters. The new logic should be implemented within that rule.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it became a separate rule following #15558 (comment)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was mainly because of @martint's comment from 4 years ago:

// TODO: DomainTranslator.fromPredicate can infer that the expression is "false" in some cases (TupleDomain.none()).
// This should move to another rule that simplifies the filter using that logic and then rely on RemoveTrivialFilters
// to turn the subtree into a Values node

But I don't object to merging these two, they seem to go together anyway + that will turn RemoveTrivialFilters into an ExpressionRewriteRuleSet so maybe it will also benefit the join case (I didn't check).
I'm not sure the term "trivial" will still be accurate though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to have a rule that simplifies expressions (we already have one) and one that removes filters where the expression is known to filter everything or nothing (we already have one, too). But we should not add yet another rule that mixes both concerns.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look, RemoveTrivialFilters can't become an ExpressionRewriteRuleSet because ExpressionRewriteRuleSet doesn't replace a node of one type with a node of another type.
The new rule's logic can't be moved into SimplifyExpressions because not all node types are supported (see #16206 (comment)).
I can make RemoveTrivialFilters return 2 rules, but I failed to prove the join case (couldn't find an actual query that has a final plan that contains a JoinNode with a filter that can be simplified to FALSE \ TRUE).
@findepi \ @martint WDYT should be the next step?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the join case (couldn't find an actual query that has a final plan that contains a JoinNode with a filter that can be simplified to FALSE \ TRUE).

I guess that's because PPD pushes such filter under the join and then it gets simplified/eliminated there

But we should not add yet another rule that mixes both concerns.

I read it as implying RemoveTrivialFilters shouldn't call DomainTranslator for help.

We tried to introduce a generic rule ExpressionRewriteRuleSet-based to simplify boolean conditions that can be replaced with constant false. I know what problems this run into (distinguishing context: eg between projected).
Did it result in more test coverage? like cases mentioned in #16206 (comment) ?

Copy link
Copy Markdown
Member Author

@assaf2 assaf2 Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it as implying RemoveTrivialFilters shouldn't call DomainTranslator for help.

Please see #16206 (comment). Note that adding this logic into RemoveTrivialFilters will address this concern you raised #16206 (comment).

Did it result in more test coverage? like cases mentioned in #16206 (comment) ?

I didn't add new tests for those cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add new tests for those cases.

Can you please do?
These are interesting test cases to have, regardless of the implementation choices we end up doing in this PR

types);

if (decomposedPredicate.getTupleDomain().isNone()) {
return FALSE_LITERAL;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the expression was already a false_literal? do we need to return expression in such case (same, not just equal)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason for that, do you have something in mind?

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 17, 2024
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jan 17, 2024

👋 @assaf2 @findepi @martint - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@github-actions github-actions bot removed the stale label Jan 18, 2024
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 12, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2024

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants