Add a test for varchar cast to date predicate#13605
Add a test for varchar cast to date predicate#13605findepi wants to merge 2 commits intotrinodb:masterfrom
Conversation
ab3c69a to
b23f697
Compare
There was a problem hiding this comment.
If we want to guarantee this, then I don't think any improvement like #12925 is possible.
There was a problem hiding this comment.
If we want to guarantee this, then I don't think any improvement like #12925 is possible.
Why? Because we might miss the error in data?
There was a problem hiding this comment.
@sopel39 yes, but idk whether it matters. This is a question.
There was a problem hiding this comment.
Indeed, that's hard/impossible to guarantee.
In Trino, a query that should succeed according to standard SQL semantics will succeed and produce correct results. A query that might fail due to bad data or bad assumptions about data may or may not fail depending on optimizations performed by the engine and connectors.
A trivial example is the following:
SELECT * FROM (
SELECT ds, 1/x FROM t
) u
WHERE ds = '2022-08-18'
Let's say there's a row with values ('2022-01-01', 0). According to standard SQL semantics, such query should fail, as the 1/x term should be evaluated before the data is filtered by the ds = '2022-08-18`` clause. In Trino, such a query would generally work due to the predicate being pushed down below the 1/x` (and, possibly, into the connector). This is out of necessity: maintaining strict semantics for failures would make Trino unusable slow -- it would need to scan every row every single time and perform operations in the order established by the original query.
There was a problem hiding this comment.
@martint that's a good example and pretty compelling.
In your example, one column is used to filter out problematic data from the other column, so query (likely) doesn't fail. This is "out of necessity" (agreed) and also should be easy to explain to users.
My question is about case with one column. Does it change anything for you, @martint ?
There was a problem hiding this comment.
Kept the test case, but added a comment:
// This failure isn't guaranteed. TODO make test more flexible when need arises.
ebc89ea to
fdd6ad8
Compare
6af57a8 to
46a5bb4
Compare
There was a problem hiding this comment.
Since hasCorrectResultsRegardlessOfPushdown() returns a QueryAssert, shouldn't this be:
return hasCorrectResultsRegardlessOfPushdown()
?
There was a problem hiding this comment.
QueryAssert needs to know whether hasCorrectResultsRegardlessOfPushdown returns this for chaining (convenience) or a different QueryAssert state. Without knowing it, it's impossible to judge whether returning this is "more correct" than returning hasCorrectResultsRegardlessOfPushdown()
There was a problem hiding this comment.
Doesn't it know? It's exposed as a public method to be used by tests, so the contract should be clear.
Regardless, I find it a bit confusing. When I read the code it's not immediately clear whether there's a bug lurking (is the result of hasCorrectResultsRegardlessOfPushdown() eager or lazy, so is it possible it's a no-op?)
There was a problem hiding this comment.
It would be more confusing to conditionally return hasCorrectResultsRegardlessOfPushdown() else return this.
There was a problem hiding this comment.
How so? If it doesn't go into that block, there's nothing else to return, and no potential additional state induced by that method, so this is the only natural thing to return.
There was a problem hiding this comment.
so
thisis the only natural thing to return.
yes!
and the code needs to know (and does know) that hasCorrectResultsRegardlessOfPushdown returns this, so leverages this fact.
The case where varchar is cast to date and used in a predicate is not uncommon and it would be worth optimizing for. This involves, however, some edge cases which are easy to forget about when working on such an optimization (either within the engine, or on the connector level).
d4427a2 to
3d11f3e
Compare
|
(just rebased) |
|
Merging back into #13567 |
The case where varchar is cast to date and used in a predicate is not
uncommon and it would be worth optimizing for (#12925).
This involves, however, some edge cases which are easy to forget
about when working on such an optimization (either within the engine,
or on the connector level).
Extracted from #13567