Skip to content

Simplify predicates involving date_trunc#14011

Merged
findepi merged 9 commits intotrinodb:masterfrom
findepi:findepi/date-trunc-pushdown
Sep 14, 2022
Merged

Simplify predicates involving date_trunc#14011
findepi merged 9 commits intotrinodb:masterfrom
findepi:findepi/date-trunc-pushdown

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Sep 6, 2022

No description provided.

@cla-bot cla-bot bot added the cla-signed label Sep 6, 2022
@findepi findepi force-pushed the findepi/date-trunc-pushdown branch 2 times, most recently from 49b384c to ae59bd0 Compare September 7, 2022 12:23
@findepi findepi marked this pull request as ready for review September 7, 2022 12:23
@findepi findepi added enhancement New feature or request performance labels Sep 7, 2022
@findepi findepi self-assigned this Sep 7, 2022
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 7, 2022

CI #12385

@findepi findepi requested a review from alexjo2144 September 9, 2022 14:33
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that still preferable when it comes to CPU cycles when we do not end up pushing down the expression?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This i don't know.
Do you think I should not do this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - I think there may be a tradoff. But I do not know the answer. My guts say it is better to be able to do pushdown that have slight (if any) performance degradataion. To know if there really is any negative impact of that we would need some micro benchmarking - maybe we have alreayd IDK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - I think there may be a tradoff. But I do not know the answer. My guts say it is better to be able to do pushdown that have slight (if any) performance degradataion.

I am convinced about that as well.

I can try to handle this case is the comparison context only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to handle this case is the comparison context only.

Done. This is now removed from here, taken care of by UnwrapDateTruncInComparison.

Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

When using day, month, year partitioning transforms, it's natural to use
`year(c)`, `date(c)` or `date_trunc(..., c)` when querying an individual
partition.  Add test coverage documenting current state of Iceberg
pushdown for such predicates.
@findepi findepi force-pushed the findepi/date-trunc-pushdown branch from 983998e to f274215 Compare September 12, 2022 08:14
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 12, 2022

(just rebased, to resolve a conflict)

@findepi findepi force-pushed the findepi/date-trunc-pushdown branch from f274215 to d2ade35 Compare September 12, 2022 08:25
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add this in a dedicated rule than keep piling on this class and turning it into a kitchen sink of function translations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

`date_trunc('day', a_date)` is a no-op and can be replaced with
`a_date`.
Range predicates (ComparisonExpression, BetweenPredicate) can be
transformed into a `TupleDomain` and thus help with predicate pushdown.
Range-based `TupleDomain` representation is critical for connectors
which have min/max-based metadata (like Iceberg data files and manifest
lists), as ranges allow for intersection tests, something that is hard
to do in a generic manner for `ConnectorExpression`.
E.g. support for TIMESTAMP or INTERVAL literals was missing.
Engine's `UnwrapDateTruncInComparison` helps with predicates involving
`date_trunc` over `date` and `timestamp` types, but it cannot help with
`timestamp with time zone`, due to how `date_trunc` operates on value's
local time and values are compared by point-in-time.

In Iceberg, however, we know that all `timestamp with time zone` values
are in UTC zone. This allows us to derive value range (`Domain`) for
cases where engine wouldn't be able to do this.
@findepi findepi force-pushed the findepi/date-trunc-pushdown branch from d2ade35 to 92706fa Compare September 13, 2022 19:17
@findepi findepi merged commit 154223c into trinodb:master Sep 14, 2022
@findepi findepi deleted the findepi/date-trunc-pushdown branch September 14, 2022 11:57
@github-actions github-actions bot added this to the 396 milestone Sep 14, 2022
@findepi findepi mentioned this pull request Sep 14, 2022
|| argumentType instanceof TimestampWithTimeZoneType
|| argumentType instanceof VarcharType) {
// prefer `CAST(x as DATE)` to `date(x)`
// prefer `CAST(x as DATE)` to `date(x)`, see e.g. UnwrapCastInComparison
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

5 participants