Simplify predicates involving date_trunc#14011
Conversation
49b384c to
ae59bd0
Compare
|
CI #12385 |
...o-main/src/main/java/io/trino/sql/planner/iterative/rule/CanonicalizeExpressionRewriter.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/sql/planner/iterative/rule/ReplaceDateTruncInComparison.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/sql/planner/iterative/rule/ReplaceDateTruncInComparison.java
Outdated
Show resolved
Hide resolved
...ino-main/src/main/java/io/trino/sql/planner/iterative/rule/ReplaceDateTruncInComparison.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
...o-main/src/main/java/io/trino/sql/planner/iterative/rule/CanonicalizeExpressionRewriter.java
Outdated
Show resolved
Hide resolved
2449231 to
983998e
Compare
There was a problem hiding this comment.
is that still preferable when it comes to CPU cycles when we do not end up pushing down the expression?
There was a problem hiding this comment.
Good question. This i don't know.
Do you think I should not do this?
There was a problem hiding this comment.
Well - I think there may be a tradoff. But I do not know the answer. My guts say it is better to be able to do pushdown that have slight (if any) performance degradataion. To know if there really is any negative impact of that we would need some micro benchmarking - maybe we have alreayd IDK.
There was a problem hiding this comment.
Well - I think there may be a tradoff. But I do not know the answer. My guts say it is better to be able to do pushdown that have slight (if any) performance degradataion.
I am convinced about that as well.
I can try to handle this case is the comparison context only.
There was a problem hiding this comment.
I can try to handle this case is the comparison context only.
Done. This is now removed from here, taken care of by UnwrapDateTruncInComparison.
When using day, month, year partitioning transforms, it's natural to use `year(c)`, `date(c)` or `date_trunc(..., c)` when querying an individual partition. Add test coverage documenting current state of Iceberg pushdown for such predicates.
983998e to
f274215
Compare
|
(just rebased, to resolve a conflict) |
f274215 to
d2ade35
Compare
There was a problem hiding this comment.
It's better to add this in a dedicated rule than keep piling on this class and turning it into a kitchen sink of function translations.
`date_trunc('day', a_date)` is a no-op and can be replaced with
`a_date`.
Range predicates (ComparisonExpression, BetweenPredicate) can be transformed into a `TupleDomain` and thus help with predicate pushdown. Range-based `TupleDomain` representation is critical for connectors which have min/max-based metadata (like Iceberg data files and manifest lists), as ranges allow for intersection tests, something that is hard to do in a generic manner for `ConnectorExpression`.
E.g. support for TIMESTAMP or INTERVAL literals was missing.
Engine's `UnwrapDateTruncInComparison` helps with predicates involving `date_trunc` over `date` and `timestamp` types, but it cannot help with `timestamp with time zone`, due to how `date_trunc` operates on value's local time and values are compared by point-in-time. In Iceberg, however, we know that all `timestamp with time zone` values are in UTC zone. This allows us to derive value range (`Domain`) for cases where engine wouldn't be able to do this.
d2ade35 to
92706fa
Compare
| || argumentType instanceof TimestampWithTimeZoneType | ||
| || argumentType instanceof VarcharType) { | ||
| // prefer `CAST(x as DATE)` to `date(x)` | ||
| // prefer `CAST(x as DATE)` to `date(x)`, see e.g. UnwrapCastInComparison |
No description provided.