Skip to content

Improve comparison predicate pushdown when varchar column cast to date#13567

Merged
findepi merged 2 commits intotrinodb:masterfrom
findepi:findepi/predicate-cast-varchar-to-date
Aug 29, 2022
Merged

Improve comparison predicate pushdown when varchar column cast to date#13567
findepi merged 2 commits intotrinodb:masterfrom
findepi:findepi/predicate-cast-varchar-to-date

Conversation

@findepi
Copy link
Member

@findepi findepi commented Aug 9, 2022

Such a cast cannot reasonably be pruned away because the source varchar
value can be any of multiple forms (surrounding whitespace, optional
year sign, optional leading zeros for date components). However, a
usefully narrow Domain can be extracted and passed over to connectors
to help connectors prune the data.

Fixes #12925

@findepi findepi added enhancement New feature or request performance labels Aug 9, 2022
@cla-bot cla-bot bot added the cla-signed label Aug 9, 2022
@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch 2 times, most recently from 800ebc9 to 0b0ad8d Compare August 10, 2022 13:35
@findepi findepi marked this pull request as draft August 10, 2022 14:49
@findepi
Copy link
Member Author

findepi commented Aug 10, 2022

Marked draft because

@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch from 0b0ad8d to af06f25 Compare August 11, 2022 10:12
The case where varchar is cast to date and used in a predicate is not
uncommon and it would be worth optimizing for. This involves, however,
some edge cases which are easy to forget about when working on such an
optimization (either within the engine, or on the connector level).
@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch 2 times, most recently from dfe10c3 to 67ef8fb Compare August 24, 2022 11:16
@findepi findepi marked this pull request as ready for review August 24, 2022 11:20
@findepi
Copy link
Member Author

findepi commented Aug 24, 2022

Ready for review now.

@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch 2 times, most recently from a968bbe to c13966f Compare August 24, 2022 11:39
@findepi
Copy link
Member Author

findepi commented Aug 24, 2022

Added a test in TestIcebergMetadataFileOperations that shows gains for Iceberg.
Gains for JDBC connectors are "obvious".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this addition hidden gem.
Could we get this documented ?

We should probably add here in the test hints that the casting improvement is not for everything.

e.g. : SELECT * FROM test_varchar_as_date_predicate WHERE year(CAST(a AS date)) >= 2005 does not benefit anymore of the the improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this addition hidden gem.
Could we get this documented ?

"Trino does as much optimizations as possible".
I think we should not try to document all the optimizations. What for?

e.g. : SELECT * FROM test_varchar_as_date_predicate WHERE year(CAST(a AS date)) >= 2005 does not benefit anymore of the the improvement.

yes, not yet.

@findinpath
Copy link
Contributor

Gains for JDBC connectors are "obvious".

Out of curiosity: how would we be able to test the gains?

@findepi
Copy link
Member Author

findepi commented Aug 25, 2022

Gains for JDBC connectors are "obvious".

Out of curiosity: how would we be able to test the gains?

We could perhaps check number of input positions (rows) retrieved from the remote database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity question. Char SQL type ordering uses the same order as Java char ordering? ie. if you add one to a char in java you also get the next char when it comes to SQL char sorting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general -- no (i think it breaks for non-BMP)
Here, we're operating within ASCII (digits and hyphen). '9' + 1 produces ':', still ASCII.

@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch 2 times, most recently from e096285 to 1ad0add Compare August 26, 2022 11:44
@findepi findepi force-pushed the findepi/predicate-cast-varchar-to-date branch 2 times, most recently from a97ecba to 422d0fb Compare August 26, 2022 11:59
Such a cast cannot reasonably be pruned away because the source varchar
value can be any of multiple forms (surrounding whitespace, optional
year sign, optional leading zeros for date components). However, a
usefully narrow `Domain` can be extracted and passed over to connectors
to help connectors prune the data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

Derive predicate over base column for CAST(a_varchar AS date) > a_date_constant condition

5 participants