Add preferred timeout for small dynamic filters#22527
Add preferred timeout for small dynamic filters#22527Dith3r wants to merge 3 commits intotrinodb:masterfrom
Conversation
3e150d1 to
6ee8f40
Compare
a70fc00 to
c970010
Compare
core/trino-main/src/main/java/io/trino/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/DynamicFilterConfig.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/sql/planner/optimizations/DeterminePreferredDynamicFilterTimeout.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcDynamicFilteringConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestBackgroundHiveSplitLoader.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcDynamicFilteringSplitManager.java
Outdated
Show resolved
Hide resolved
41e1320 to
5238c35
Compare
There was a problem hiding this comment.
Why isn’t this a Rule or set of them? Visitor-based optimizers are legacy, and we’re moving away from them. We should avoid introducing any new ones.
There was a problem hiding this comment.
We're trying to change an attribute of the filter node on the probe side scan of join based on the estimated size of the build side of that join. I'm not sure how this would be designed as one or more series of Rules.
Do we have a similar Rule elsewhere that we could use as a reference ?
There was a problem hiding this comment.
Rule approach would require matching against Join/SemiJoin/DynamicFilterSourceNode and processing all probe side FilterNodes, extracting conjuncts to match against join node dynamic filter. It would cause far more lookups and rewrites than extracting all dynamic filter sources and process filter nodes once.
20ba16f to
09b8a31
Compare
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@Dith3r can you rebase? |
|
@sopel39 ptal |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@Dith3r please rebase :) |
There was a problem hiding this comment.
If you call getPreferredTimeout before acquiring future from isBlocked you can receive future with excluded already resolved DF associated with preferred timeout and wait for DFs that have no preferred timeout set.
There was a problem hiding this comment.
If DF is already resolved, then no waiting will happen anyway, right?
Add a CBO rule to estimate if dynamic filter is worth waiting for. Focus on small tables that are most often used as dimension tables.
| DynamicFilters.Descriptor::getId, | ||
| DynamicFilters.Descriptor::getPreferredTimeout, | ||
| (timeout1, timeout2) -> { | ||
| if (timeout1.isPresent() && timeout2.isPresent()) { |
There was a problem hiding this comment.
you could use io.trino.util.Optionals#combine
There was a problem hiding this comment.
It would require similar function which is based on OptionalLong.
| return !isAtMostScalar(joinNode.getRight()) && !isAtMostScalar(joinNode.getLeft()); | ||
| } | ||
|
|
||
| // Skip for joins with multi keys since output row count stats estimation can wrong due to |
| } | ||
|
|
||
| // Skip for joins with multi keys since output row count stats estimation can wrong due to | ||
| // low correlation between multiple join keys. |
There was a problem hiding this comment.
it should probably: can be wrong due to potentially high correlation between join columns, which might lead to underestimation of join output row count
| private static boolean isExpandingPlanNode(PlanNode node) | ||
| { | ||
| return node instanceof JoinNode | ||
| // consider union node and exchange node with multiple sources as expanding since it merge the rows |
There was a problem hiding this comment.
should be more like union and exchange nodes that consume from multiple source are considered expanding since they produce more data than their children individually
|
|
||
| private DynamicFilterTimeout getBuildSideState(PlanNode planNode, Symbol dynamicFilterSymbol) | ||
| { | ||
| if (isAtMostScalar(planNode)) { |
There was a problem hiding this comment.
comment would be nice + commit message
| .matches(); | ||
| } | ||
|
|
||
| private static boolean isExpandingPlanNode(PlanNode node) |
Description
Add a CBO rule to estimate if dynamic filter is worth waiting for. Focus on small tables that are most often used as dimension tables and nodes that generate small number of distinct values.
Rule DeterminePreferredDynamicFilterTimeout could allow us to remove 20s forced wait time for every possible DF in JDBC connectors if table presents the least row count statistic.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text: