Rewrite casts in between predicate#14648
Conversation
|
Current limitation I stumbled while trying to add the methods:
to As an alternative we could use custom operators for |
core/trino-main/src/test/java/io/trino/sql/planner/TestUnwrapCastInComparison.java
Outdated
Show resolved
Hide resolved
c833e20 to
ee0351c
Compare
ee0351c to
4855e69
Compare
4855e69 to
8438ecd
Compare
core/trino-spi/src/main/java/io/trino/spi/type/LongTimestampWithTimeZoneType.java
Outdated
Show resolved
Hide resolved
|
I assume this is a typo, but in the motivating example the rewrite should be: (Note the precision of the first timestamp) |
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java
Outdated
Show resolved
Hide resolved
|
Rebasing on top of Going to Draft mode until the comments from the PR are addressed. |
10d5999 to
63ffd74
Compare
63ffd74 to
5ed39aa
Compare
|
Failure on ci/test-other-modules |
I think it might be desirable to have |
|
Is this still in progress @findinpath ? |
|
@findinpath, sorry, this fell between the cracks. I'm happy to help get it in. Can you rebase? |
c4ba962 to
2da4d98
Compare
2da4d98 to
f0a1ad5
Compare
f0a1ad5 to
1a5764e
Compare
|
Unrelated CI failure #20391 |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
Co-authored-by: kabunchi <tal@varada.io>
This change allows the engine to infer that, for instance,
given t::timestamp(6)
cast(t as date) BETWEEN DATE '2022-01-01' AND DATE '2022-01-02'
can be rewritten as
t BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 23:59:59.999999'
Range predicate `BetweenPredicate` can be transformed into a `TupleDomain`
and thus help with predicate pushdown.
Range-based `TupleDomain` representation is critical for connectors
which have min/max-based metadata (like Iceberg manifests lists which
play a key role in partition pruning or Iceberg data files), as ranges allow
for intersection tests, something that is hard
to do in a generic manner for `ConnectorExpression`.
9db4049 to
61c4581
Compare
| Expression trueIfNotNullExpression = trueIfNotNull(cast.expression()); | ||
| if (trueIfNotNullExpression.equals(lowBoundUnwrapped)) { | ||
| return highBoundUnwrapped; | ||
| } | ||
| if (trueIfNotNullExpression.equals(highBoundUnwrapped)) { | ||
| return lowBoundUnwrapped; | ||
| } |
There was a problem hiding this comment.
Not that this is potentially incorrect. Consider
lowBoundUnwrapped:trueIfNotNull(cast)highBoundUnwrapped: evaluates totruefor null value ofcast.
I understand this is not the case today because the lowBoundUnwrapped and highBoundUnwrapped can only take a few different forms, as they are results of the unwrapCast method.
However, the unwrapCast method might evolve over time, and this logic might get out of sync.
I suggest that before we proceed to combining lowBoundUnwrapped and highBoundUnwrapped, we first assert that both expressions satisfy some structural requirements.
| Expression falseIfNotNullExpression = falseIfNotNull(cast.expression()); | ||
| if (falseIfNotNullExpression.equals(lowBoundUnwrapped) || falseIfNotNullExpression.equals(highBoundUnwrapped)) { | ||
| if (falseIfNotNullExpression.equals(lowBoundUnwrapped) && falseIfNotNullExpression.equals(highBoundUnwrapped)) { | ||
| return falseIfNotNullExpression; |
| Type highBoundType = highBoundUnwrappedComparison.right().type(); | ||
|
|
||
| if (cast.expression().equals(lowBoundUnwrappedComparison.left()) && | ||
| Objects.equals(sourceType, lowBoundType) && |
There was a problem hiding this comment.
Comparing the types is redundant. In Comparison, both operands are of the same type.
| if (cast.expression().equals(lowBoundUnwrappedComparison.left()) && | ||
| Objects.equals(sourceType, lowBoundType) && | ||
| cast.expression().equals(highBoundUnwrappedComparison.left()) && | ||
| Objects.equals(sourceType, highBoundType)) { |
| // Try to reconstruct the BETWEEN predicate with the cast unwrapped | ||
| if (!(lowBoundUnwrappedComparison.right() instanceof Constant(Type _, Object lowBoundValue)) | ||
| || !(highBoundUnwrappedComparison.right() instanceof Constant(Type _, Object highBoundValue))) { | ||
| return expression; |
There was a problem hiding this comment.
Why not and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison)?
| return expression; | ||
| } | ||
|
|
||
| int compareLowBoundValueAndHighBoundValue = compare(sourceType, lowBoundValue, highBoundValue); |
There was a problem hiding this comment.
The compare method would throw exception if lowBoundValue or highBoundValue was null.
Again, we are making implicit assumptions about what the unwrapCast method returns.
| if (compareNextLowAndHighBound > 0) { | ||
| return falseIfNotNull(cast.expression()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Could we refactor the preceding code for low and high bounds of the created between expression so that we
- compute the low and high bounds (either as the constants from the comparisons or the next/previous values)
- compare them and return
falseIfNotNull(cast.expression())if low > high - create and return the between expression
?
| } | ||
| } | ||
|
|
||
| return and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison); |
| return and(lowBoundUnwrappedComparison, highBoundUnwrappedComparison); | ||
| } | ||
|
|
||
| return expression; |
There was a problem hiding this comment.
The unwrapCast method might return constant FALSE. I don't think this case is covered. The result would be FALSE.
Description
Alternative version for #14452 built on top of the work performed on #12797
This change allows the engine to infer that, for instance, given t::timestamp(6)
can be rewritten as
The change applies for the temporal types:
6)as well as the other types which benefit of implementations for
getPreviousValue(Object)andgetNextValue(Object).Range predicate BetweenPredicate can be transformed into a
TupleDomainand thus help with predicate pushdown.Range-based
TupleDomainrepresentation is critical for connectors which have min/max-based metadata (like Iceberg manifests lists which play a key role in partition pruning or Iceberg data files), as ranges allow for intersection tests, something that is hardto do in a generic manner for
ConnectorExpression.This is a spin-off from #14390
Additional details
The details of the logic of the method
io.trino.sql.planner.iterative.rule.UnwrapCastInComparison.Visitor#rewriteBetweenPredicatecan be verified for correctness with the testio.trino.sql.planner.TestUnwrapCastInComparison#testBetweenRelease notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: