Skip to content

Rewrite casts in between predicate#14648

Open
findinpath wants to merge 4 commits intotrinodb:masterfrom
findinpath:rewrite-casts-in-between-predicate-v2
Open

Rewrite casts in between predicate#14648
findinpath wants to merge 4 commits intotrinodb:masterfrom
findinpath:rewrite-casts-in-between-predicate-v2

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Oct 14, 2022

Description

Alternative version for #14452 built on top of the work performed on #12797

This change allows the engine to infer that, for instance, given t::timestamp(6)

cast(t as date) BETWEEN DATE '2022-01-01' AND DATE '2022-01-02'

can be rewritten as

 t BETWEEN TIMESTAMP '2022-01-01 00:00:00.000000' AND  TIMESTAMP '2022-01-02 23:59:59.999999'

The change applies for the temporal types:

  • date
  • timestamp
  • timestamp with time zone (up to precision 6)

as well as the other types which benefit of implementations for getPreviousValue(Object) and getNextValue(Object).

Range predicate BetweenPredicate can be transformed into a TupleDomain and thus help with predicate pushdown.
Range-based TupleDomain representation is critical for connectors which have min/max-based metadata (like Iceberg manifests lists which play a key role in partition pruning or Iceberg data files), as ranges allow for intersection tests, something that is hard
to do in a generic manner for ConnectorExpression.

This is a spin-off from #14390

Additional details

The details of the logic of the method io.trino.sql.planner.iterative.rule.UnwrapCastInComparison.Visitor#rewriteBetweenPredicate can be verified for correctness with the test io.trino.sql.planner.TestUnwrapCastInComparison#testBetween

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Main
* Improve partition and data pruning when comparing casts with ranges in `BETWEEN` predicate

@findinpath
Copy link
Copy Markdown
Contributor Author

Current limitation

I stumbled while trying to add the methods:

  • getPreviousValue(Object)
  • getNextValue(Object)

to DateType.
I am assuming that checking the validity of a date needs joda-time expertise which is not available in trino-spi module.

As an alternative we could use custom operators for nextval, previousval included in DateOperators (and the other operators for other types as well) to cover this need.

@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from c833e20 to ee0351c Compare October 27, 2022 14:53
@findinpath findinpath marked this pull request as ready for review October 27, 2022 16:41
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from ee0351c to 4855e69 Compare August 7, 2023 20:01
@findinpath findinpath changed the title Draft: Rewrite casts in between predicate v2 Rewrite casts in between predicate Aug 7, 2023
@github-actions github-actions bot added the iceberg Iceberg connector label Aug 7, 2023
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from 4855e69 to 8438ecd Compare August 8, 2023 15:43
@findinpath findinpath self-assigned this Aug 8, 2023
@martint
Copy link
Copy Markdown
Member

martint commented Aug 11, 2023

I assume this is a typo, but in the motivating example the rewrite should be:

 t >= TIMESTAMP '2022-01-01 00:00:00.000000' AND t < TIMESTAMP '2022-01-03 00:00:00.000000'

(Note the precision of the first timestamp)

@findinpath
Copy link
Copy Markdown
Contributor Author

Rebasing on top of master to reuse #18588

Going to Draft mode until the comments from the PR are addressed.

@findinpath findinpath marked this pull request as draft August 14, 2023 20:12
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from 10d5999 to 63ffd74 Compare August 14, 2023 20:12
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from 63ffd74 to 5ed39aa Compare September 5, 2023 19:30
@findinpath findinpath marked this pull request as ready for review September 5, 2023 19:33
@findinpath
Copy link
Copy Markdown
Contributor Author

findinpath commented Sep 6, 2023

Failure on ci/test-other-modules

https://github.com/trinodb/trino/actions/runs/6088945758/job/16520680223?pr=14648

Error:  io.trino.plugin.google.sheets.TestGoogleSheets.init -- Time elapsed: 1.023 s <<< FAILURE!
com.google.api.client.googleapis.json.GoogleJsonResponseException: 
503 Service Unavailable
POST https://sheets.googleapis.com/v4/spreadsheets?fields=spreadsheetId
{
  "code" : 503,
  "errors" : [ {
    "domain" : "global",
    "message" : "The service is currently unavailable.",
    "reason" : "backendError"
  } ],
  "message" : "The service is currently unavailable.",
  "status" : "UNAVAILABLE"
}

@findinpath findinpath requested a review from findepi September 6, 2023 06:29
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 6, 2023

I assume this is a typo, but in the motivating example the rewrite should be:

 t >= TIMESTAMP '2022-01-01 00:00:00.000000' AND t < TIMESTAMP '2022-01-03 00:00:00.000000'

I think it might be desirable to have <= on the right, as this allows keeping BETWEEN as the predicate.

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jan 11, 2024

Is this still in progress @findinpath ?

@findinpath
Copy link
Copy Markdown
Contributor Author

@mosabua I did try repeatedly to find a maintainer to accompany this PR without success.

@martint can you suggest somebody to accompany eventually this PR ?

@martint
Copy link
Copy Markdown
Member

martint commented Jan 12, 2024

@findinpath, sorry, this fell between the cracks. I'm happy to help get it in. Can you rebase?

@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch 2 times, most recently from c4ba962 to 2da4d98 Compare January 12, 2024 20:42
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from 2da4d98 to f0a1ad5 Compare January 15, 2024 19:13
@findinpath findinpath requested a review from martint January 15, 2024 19:13
@findinpath
Copy link
Copy Markdown
Contributor Author

@martint I did rebase on top of the latest changes.
Please have a look.
Feel free to comment if the issue description is not making clear the purpose of this PR.

Heads'up: As a follow-up, i'll make a similar improvement for date_trunc (after reviving #14451)

@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from f0a1ad5 to 1a5764e Compare January 16, 2024 14:25
@findinpath
Copy link
Copy Markdown
Contributor Author

Unrelated CI failure #20391

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 4, 2024
@github-actions
Copy link
Copy Markdown

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Sep 26, 2024
@findinpath findinpath reopened this Oct 4, 2024
@findinpath findinpath added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Oct 4, 2024
findinpath and others added 2 commits December 27, 2024 16:53
This change allows the engine to infer that, for instance,
given t::timestamp(6)

    cast(t as date) BETWEEN DATE '2022-01-01' AND DATE '2022-01-02'

can be rewritten as

    t BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 23:59:59.999999'

Range predicate `BetweenPredicate` can be transformed into a `TupleDomain`
and thus help with predicate pushdown.
Range-based `TupleDomain` representation is critical for connectors
which have min/max-based metadata (like Iceberg manifests lists which
play a key role in partition pruning or Iceberg data files), as ranges allow
for intersection tests, something that is hard
to do in a generic manner for `ConnectorExpression`.
@findinpath findinpath force-pushed the rewrite-casts-in-between-predicate-v2 branch from 9db4049 to 61c4581 Compare December 27, 2024 23:22
@findinpath findinpath requested review from kasiafi and raunaqmorarka and removed request for findepi December 29, 2024 07:39
Copy link
Copy Markdown
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started reviewing

Comment on lines +186 to +192
Expression trueIfNotNullExpression = trueIfNotNull(cast.expression());
if (trueIfNotNullExpression.equals(lowBoundUnwrapped)) {
return highBoundUnwrapped;
}
if (trueIfNotNullExpression.equals(highBoundUnwrapped)) {
return lowBoundUnwrapped;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this is potentially incorrect. Consider

  • lowBoundUnwrapped: trueIfNotNull(cast)
  • highBoundUnwrapped: evaluates to true for null value of cast.

I understand this is not the case today because the lowBoundUnwrapped and highBoundUnwrapped can only take a few different forms, as they are results of the unwrapCast method.

However, the unwrapCast method might evolve over time, and this logic might get out of sync.

I suggest that before we proceed to combining lowBoundUnwrapped and highBoundUnwrapped, we first assert that both expressions satisfy some structural requirements.

Expression falseIfNotNullExpression = falseIfNotNull(cast.expression());
if (falseIfNotNullExpression.equals(lowBoundUnwrapped) || falseIfNotNullExpression.equals(highBoundUnwrapped)) {
if (falseIfNotNullExpression.equals(lowBoundUnwrapped) && falseIfNotNullExpression.equals(highBoundUnwrapped)) {
return falseIfNotNullExpression;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is each of these cases tested?

Type highBoundType = highBoundUnwrappedComparison.right().type();

if (cast.expression().equals(lowBoundUnwrappedComparison.left()) &&
Objects.equals(sourceType, lowBoundType) &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the types is redundant. In Comparison, both operands are of the same type.

if (cast.expression().equals(lowBoundUnwrappedComparison.left()) &&
Objects.equals(sourceType, lowBoundType) &&
cast.expression().equals(highBoundUnwrappedComparison.left()) &&
Objects.equals(sourceType, highBoundType)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

// Try to reconstruct the BETWEEN predicate with the cast unwrapped
if (!(lowBoundUnwrappedComparison.right() instanceof Constant(Type _, Object lowBoundValue))
|| !(highBoundUnwrappedComparison.right() instanceof Constant(Type _, Object highBoundValue))) {
return expression;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison)?

return expression;
}

int compareLowBoundValueAndHighBoundValue = compare(sourceType, lowBoundValue, highBoundValue);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compare method would throw exception if lowBoundValue or highBoundValue was null.

Again, we are making implicit assumptions about what the unwrapCast method returns.

if (compareNextLowAndHighBound > 0) {
return falseIfNotNull(cast.expression());
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refactor the preceding code for low and high bounds of the created between expression so that we

  1. compute the low and high bounds (either as the constants from the comparisons or the next/previous values)
  2. compare them and return falseIfNotNull(cast.expression()) if low > high
  3. create and return the between expression

?

}
}

return and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is each case covered in tests?

return and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison);
}

return expression;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unwrapCast method might return constant FALSE. I don't think this case is covered. The result would be FALSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

Development

Successfully merging this pull request may close these issues.

5 participants