Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ public QueryAssert isFullyPushedDown()

if (!skipResultsCorrectnessCheckForPushdown) {
// Compare the results with pushdown disabled, so that explicit matches() call is not needed
verifyResultsWithPushdownDisabled();
hasCorrectResultsRegardlessOfPushdown();
}
return this;
}
Expand Down Expand Up @@ -511,17 +511,19 @@ private QueryAssert hasPlan(PlanMatchPattern expectedPlan, Consumer<Plan> additi

if (!skipResultsCorrectnessCheckForPushdown) {
// Compare the results with pushdown disabled, so that explicit matches() call is not needed
verifyResultsWithPushdownDisabled();
hasCorrectResultsRegardlessOfPushdown();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since hasCorrectResultsRegardlessOfPushdown() returns a QueryAssert, shouldn't this be:

return hasCorrectResultsRegardlessOfPushdown()

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryAssert needs to know whether hasCorrectResultsRegardlessOfPushdown returns this for chaining (convenience) or a different QueryAssert state. Without knowing it, it's impossible to judge whether returning this is "more correct" than returning hasCorrectResultsRegardlessOfPushdown()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it know? It's exposed as a public method to be used by tests, so the contract should be clear.

Regardless, I find it a bit confusing. When I read the code it's not immediately clear whether there's a bug lurking (is the result of hasCorrectResultsRegardlessOfPushdown() eager or lazy, so is it possible it's a no-op?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more confusing to conditionally return hasCorrectResultsRegardlessOfPushdown() else return this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? If it doesn't go into that block, there's nothing else to return, and no potential additional state induced by that method, so this is the only natural thing to return.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is the only natural thing to return.

yes!
and the code needs to know (and does know) that hasCorrectResultsRegardlessOfPushdown returns this, so leverages this fact.

}
return this;
}

private void verifyResultsWithPushdownDisabled()
@CanIgnoreReturnValue
public QueryAssert hasCorrectResultsRegardlessOfPushdown()
{
Session withoutPushdown = Session.builder(session)
.setSystemProperty("allow_pushdown_into_connectors", "false")
.build();
matches(runner.execute(withoutPushdown, query));
return this;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,15 @@ public void testDateYearOfEraPredicate()
.hasStackTraceContaining("Cannot apply operator: varchar = date");
}

@Override
public void testVarcharCastToDateInPredicate()
{
assertThatThrownBy(super::testVarcharCastToDateInPredicate)
.hasStackTraceContaining("Table partitioning must be specified using setRangePartitionColumns or addHashPartitions");

throw new SkipException("TODO: implement the test for Kudu");
}

@Test
@Override
public void testCharVarcharComparison()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,49 @@ public void testSortItemsReflectedInExplain()
expectedPattern);
}

// CAST(a_varchar AS date) = DATE '...' is an interesting condition which may want to be optimized by the engine or a connector
@Test
public void testVarcharCastToDateInPredicate()
{
skipTestUnless(hasBehavior(SUPPORTS_CREATE_TABLE_WITH_DATA));

try (TestTable table = new TestTable(
getQueryRunner()::execute,
"varchar_as_date_pred",
"(a varchar)",
List.of(
"'999-09-09'",
"'1005-09-09'",
"'2005-06-06'", "'2005-06-6'", "'2005-6-06'", "'2005-6-6'", "' 2005-06-06'", "'2005-06-06 '", "' +2005-06-06'", "'02005-06-06'",
"'2005-09-06'", "'2005-09-6'", "'2005-9-06'", "'2005-9-6'", "' 2005-09-06'", "'2005-09-06 '", "' +2005-09-06'", "'02005-09-06'",
"'2005-09-09'", "'2005-09-9'", "'2005-9-09'", "'2005-9-9'", "' 2005-09-09'", "'2005-09-09 '", "' +2005-09-09'", "'02005-09-09'",
"'2005-09-10'", "'2005-9-10'", "' 2005-09-10'", "'2005-09-10 '", "' +2005-09-10'", "'02005-09-10'",
"'9999-09-09'",
"'99999-09-09'"))) {
for (String date : List.of("2005-09-06", "2005-09-09", "2005-09-10")) {
for (String operator : List.of("=", "<=", "<", ">", ">=", "!=", "IS DISTINCT FROM", "IS NOT DISTINCT FROM")) {
assertThat(query("SELECT a FROM %s WHERE CAST(a AS date) %s DATE '%s'".formatted(table.getName(), operator, date)))
.hasCorrectResultsRegardlessOfPushdown();
}
}
}

try (TestTable table = new TestTable(
getQueryRunner()::execute,
"varchar_as_date_pred",
"(a varchar)",
List.of("'2005-06-bad-date'", "'2005-09-10'"))) {
assertThatThrownBy(() -> query("SELECT a FROM %s WHERE CAST(a AS date) < DATE '2005-09-10'".formatted(table.getName())))
.hasMessage("Value cannot be cast to date: 2005-06-bad-date");
// This failure isn't guaranteed. TODO make test more flexible when need arises.
assertThatThrownBy(() -> query("SELECT a FROM %s WHERE CAST(a AS date) = DATE '2005-09-10'".formatted(table.getName())))
.hasMessage("Value cannot be cast to date: 2005-06-bad-date");
Comment on lines 541 to 542
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to guarantee this, then I don't think any improvement like #12925 is possible.

@martint @sopel39 @losipiuk @alexjo2144

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to guarantee this, then I don't think any improvement like #12925 is possible.

Why? Because we might miss the error in data?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 yes, but idk whether it matters. This is a question.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint do you have an opinion?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's hard/impossible to guarantee.

In Trino, a query that should succeed according to standard SQL semantics will succeed and produce correct results. A query that might fail due to bad data or bad assumptions about data may or may not fail depending on optimizations performed by the engine and connectors.

A trivial example is the following:

SELECT * FROM (
     SELECT ds, 1/x FROM t
) u
WHERE ds = '2022-08-18'

Let's say there's a row with values ('2022-01-01', 0). According to standard SQL semantics, such query should fail, as the 1/x term should be evaluated before the data is filtered by the ds = '2022-08-18`` clause. In Trino, such a query would generally work due to the predicate being pushed down below the 1/x` (and, possibly, into the connector). This is out of necessity: maintaining strict semantics for failures would make Trino unusable slow -- it would need to scan every row every single time and perform operations in the order established by the original query.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint that's a good example and pretty compelling.
In your example, one column is used to filter out problematic data from the other column, so query (likely) doesn't fail. This is "out of necessity" (agreed) and also should be easy to explain to users.

My question is about case with one column. Does it change anything for you, @martint ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint and I discussed this case. The conclusion is that the even in this particular case the failure isn't guaranteed and optimizations like #12925 are OK to do.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the test case, but added a comment:

// This failure isn't guaranteed. TODO make test more flexible when need arises.

// This failure isn't guaranteed. TODO make test more flexible when need arises.
assertThatThrownBy(() -> query("SELECT a FROM %s WHERE CAST(a AS date) > DATE '2022-08-10'".formatted(table.getName())))
.hasMessage("Value cannot be cast to date: 2005-06-bad-date");
}
}

@Test
public void testConcurrentScans()
{
Expand Down