Skip to content

Add a test for varchar cast to date predicate#13605

Closed
findepi wants to merge 2 commits intotrinodb:masterfrom
findepi:findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44
Closed

Add a test for varchar cast to date predicate#13605
findepi wants to merge 2 commits intotrinodb:masterfrom
findepi:findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44

Conversation

@findepi
Copy link
Member

@findepi findepi commented Aug 10, 2022

The case where varchar is cast to date and used in a predicate is not
uncommon and it would be worth optimizing for (#12925).
This involves, however, some edge cases which are easy to forget
about when working on such an optimization (either within the engine,
or on the connector level).

Extracted from #13567

@findepi findepi requested a review from hashhar August 10, 2022 14:27
@cla-bot cla-bot bot added the cla-signed label Aug 10, 2022
@findepi findepi force-pushed the findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44 branch from ab3c69a to b23f697 Compare August 10, 2022 14:31
Comment on lines 539 to 542
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to guarantee this, then I don't think any improvement like #12925 is possible.

@martint @sopel39 @losipiuk @alexjo2144

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to guarantee this, then I don't think any improvement like #12925 is possible.

Why? Because we might miss the error in data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 yes, but idk whether it matters. This is a question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint do you have an opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's hard/impossible to guarantee.

In Trino, a query that should succeed according to standard SQL semantics will succeed and produce correct results. A query that might fail due to bad data or bad assumptions about data may or may not fail depending on optimizations performed by the engine and connectors.

A trivial example is the following:

SELECT * FROM (
     SELECT ds, 1/x FROM t
) u
WHERE ds = '2022-08-18'

Let's say there's a row with values ('2022-01-01', 0). According to standard SQL semantics, such query should fail, as the 1/x term should be evaluated before the data is filtered by the ds = '2022-08-18`` clause. In Trino, such a query would generally work due to the predicate being pushed down below the 1/x` (and, possibly, into the connector). This is out of necessity: maintaining strict semantics for failures would make Trino unusable slow -- it would need to scan every row every single time and perform operations in the order established by the original query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint that's a good example and pretty compelling.
In your example, one column is used to filter out problematic data from the other column, so query (likely) doesn't fail. This is "out of necessity" (agreed) and also should be easy to explain to users.

My question is about case with one column. Does it change anything for you, @martint ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint and I discussed this case. The conclusion is that the even in this particular case the failure isn't guaranteed and optimizations like #12925 are OK to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the test case, but added a comment:

// This failure isn't guaranteed. TODO make test more flexible when need arises.

@findepi findepi force-pushed the findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44 branch from ebc89ea to fdd6ad8 Compare August 10, 2022 20:08
@findepi
Copy link
Member Author

findepi commented Aug 11, 2022

CI #13322 #13166

@findepi findepi force-pushed the findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44 branch from 6af57a8 to 46a5bb4 Compare August 23, 2022 15:02
@findepi findepi requested review from losipiuk and martint August 23, 2022 15:03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since hasCorrectResultsRegardlessOfPushdown() returns a QueryAssert, shouldn't this be:

return hasCorrectResultsRegardlessOfPushdown()

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryAssert needs to know whether hasCorrectResultsRegardlessOfPushdown returns this for chaining (convenience) or a different QueryAssert state. Without knowing it, it's impossible to judge whether returning this is "more correct" than returning hasCorrectResultsRegardlessOfPushdown()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it know? It's exposed as a public method to be used by tests, so the contract should be clear.

Regardless, I find it a bit confusing. When I read the code it's not immediately clear whether there's a bug lurking (is the result of hasCorrectResultsRegardlessOfPushdown() eager or lazy, so is it possible it's a no-op?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more confusing to conditionally return hasCorrectResultsRegardlessOfPushdown() else return this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? If it doesn't go into that block, there's nothing else to return, and no potential additional state induced by that method, so this is the only natural thing to return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is the only natural thing to return.

yes!
and the code needs to know (and does know) that hasCorrectResultsRegardlessOfPushdown returns this, so leverages this fact.

The case where varchar is cast to date and used in a predicate is not
uncommon and it would be worth optimizing for. This involves, however,
some edge cases which are easy to forget about when working on such an
optimization (either within the engine, or on the connector level).
@findepi findepi force-pushed the findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44 branch from d4427a2 to 3d11f3e Compare August 24, 2022 09:58
@findepi
Copy link
Member Author

findepi commented Aug 24, 2022

(just rebased)

@findepi
Copy link
Member Author

findepi commented Aug 25, 2022

Merging back into #13567

@findepi findepi closed this Aug 25, 2022
@findepi findepi deleted the findepi/add-a-test-for-varchar-cast-to-date-predicate-90bd44 branch August 25, 2022 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants