Unwrap casts in BETWEEN predicate#14452
Conversation
d389a6f to
4eab85d
Compare
There was a problem hiding this comment.
I'm wondering where would be the right place to unify a<=2 AND a>=2 to a=2.
4eab85d to
efbd6a0
Compare
8d84a05 to
2979c17
Compare
There was a problem hiding this comment.
you want to call treeRewriter.defaultRewrite to ensure between's child nodes are processed
you want to call unwrapCast to do something meaningful. Single pass of UnwrapCastInComparison.Visitor should do the job of unwrapping casts, and not require another pass to do its job right
core/trino-main/src/test/java/io/trino/sql/planner/TestAddDynamicFilterSource.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/TestDynamicFilter.java
Outdated
Show resolved
Hide resolved
testing/trino-benchto-benchmarks/src/test/resources/sql/presto/tpcds/partitioned/q13.plan.txt
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/UnwrapCastInComparison.java
Outdated
Show resolved
Hide resolved
2979c17 to
faf7ccf
Compare
There was a problem hiding this comment.
This looks like duplicating the logic quite much.
Did you try something like
@Override
public Expression rewriteBetweenPredicate(BetweenPredicate node, Void context, ExpressionTreeRewriter<Void> treeRewriter)
{
BetweenPredicate expression = (BetweenPredicate) treeRewriter.defaultRewrite((Expression) node, null);
ComparisonExpression lowBound = new ComparisonExpression(GREATER_THAN_OR_EQUAL, node.getValue(), node.getMin());
ComparisonExpression highBound = new ComparisonExpression(LESS_THAN_OR_EQUAL, node.getValue(), node.getMax());
Expression lowBoundUnwrapped = unwrapCast(lowBound);
Expression highBoundUnwrapped = unwrapCast(highBound);
if (lowBound.equals(lowBoundUnwrapped) && highBound.equals(highBoundUnwrapped)) {
return expression;
}
...
}?
Why it cannot be made to work?
There was a problem hiding this comment.
Created #14648 with inspiration from #12797
I stumbled while trying to add the methods:
getPreviousValue(Object)getNextValue(Object)
to DateType.
I am assuming that checking the validity of a date needs joda-time expertise which is not available in trino-spi module.
As an alternative we could use custom operators included in DateOperators to cover this need.
There was a problem hiding this comment.
I stumbled while trying to add the methods:
getPreviousValue(Object)getNextValue(Object)to
DateType.
This is now covered by #12797.
dcac104 to
af873c8
Compare
This change allows the engine to infer that, for instance,
given t::timestamp(6)
cast(t as date) BETWEEN DATE '2022-01-01' AND DATE '2022-01-02'
can be rewritten as
t BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2022-01-02 23:59:59.999999'
Range predicate BetweenPredicate can be transformed into a `TupleDomain`
and thus help with predicate pushdown.
Range-based `TupleDomain` representation is critical for connectors
which have min/max-based metadata (like Iceberg manifests lists which
play a key role in partition pruning or Iceberg data files), as ranges allow
for intersection tests, something that is hard
to do in a generic manner for `ConnectorExpression`.
af873c8 to
1c3fdaf
Compare
| join (INNER, REPLICATED): | ||
| join (INNER, REPLICATED): | ||
| join (INNER, REPLICATED): | ||
| join (INNER, PARTITIONED): |
There was a problem hiding this comment.
I'm assuming that this is a side-effect of the expression rewriting in tpcds q13.sql
Context: ss_net_profit has the type DECIMAL(7,2)
INPUT NODE in `UnwrapCastInComparison
(CAST("ss_net_profit" AS decimal(12, 2)) BETWEEN CAST(DECIMAL '100.00' AS decimal(12, 2)) AND CAST(DECIMAL '200.00' AS decimal(12, 2)))
OUTPUT NODE BEFORE CHANGES
(CAST("ss_net_profit" AS decimal(12, 2)) BETWEEN CAST(DECIMAL '100.00' AS decimal(12, 2)) AND CAST(DECIMAL '200.00' AS decimal(12, 2)))
OUTPUT NODE AFTER CHANGES
("ss_net_profit" BETWEEN CAST(DECIMAL '100.00' AS decimal(7, 2)) AND CAST(DECIMAL '200.00' AS decimal(7, 2)))
There was a problem hiding this comment.
looks reasonable
to be on the safe side, we will want to confirm this change with benchmarks (cc @przemekak )
but i don't think we're ready yet
|
CI hit #13957 |
|
trino-main is red |
|
Superseded by #14648 |
Description
This change allows the engine to infer that, for instance, given t::timestamp(6)
can be rewritten as
The change applies for the temporal types:
Range predicate BetweenPredicate can be transformed into a
TupleDomainand thus help with predicate pushdown.Range-based
TupleDomainrepresentation is critical for connectors which have min/max-based metadata (like Iceberg manifests lists which play a key role in partition pruning or Iceberg data files), as ranges allow for intersection tests, something that is hardto do in a generic manner for
ConnectorExpression.This is a spin-off from #14390
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: