Support dynamic filtering with task retries#12152
Merged
raunaqmorarka merged 6 commits intotrinodb:masterfrom Jul 27, 2022
Merged
Support dynamic filtering with task retries#12152raunaqmorarka merged 6 commits intotrinodb:masterfrom
raunaqmorarka merged 6 commits intotrinodb:masterfrom
Conversation
Member
|
@arhimondr could you do first pass? |
a90224b to
1c8b8a8
Compare
48aa5bd to
16ac46c
Compare
losipiuk
reviewed
Apr 29, 2022
Member
losipiuk
left a comment
There was a problem hiding this comment.
@raunaqmorarka It is super hard to comprehend (for me). Can you put up a short writeup explaining in simple English what is the data passing flow for DF with task retries; and how does it differ from pipelined case.
We can also chat but without whiteboard it will not be easy to discuss I think :/
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/AddDynamicFilterSource.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/AddDynamicFilterSource.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/join/JoinUtils.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/LocalDynamicFilterConsumer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/DynamicFiltersCollector.java
Outdated
Show resolved
Hide resolved
407e39d to
e674b97
Compare
1471587 to
b9bf25b
Compare
a974493 to
a5dad9e
Compare
arhimondr
approved these changes
Jul 5, 2022
6b5f803 to
6e4aac7
Compare
728532b to
fc70f11
Compare
raunaqmorarka
commented
Jul 20, 2022
core/trino-main/src/main/java/io/trino/sql/planner/LocalDynamicFilterConsumer.java
Outdated
Show resolved
Hide resolved
87378b0 to
a05618f
Compare
Member
Author
|
TPC benchmark results Summary: |
Using partitioning from tpch connector can result in repartitioned joins on pre-partitioned columns like orders.orderkey avoiding remote exchange on the build side. Skipping this optmization in tests allows for easier testing of fault tolerant mode and doesn't affect testing of pipelined mode of execution.
For fault tolerant execution, dynamic filter collection from worker will take place after completion of task.
By default the number of splits is based on system CPU count which leads to variation in local and CI setup.
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
High level changes:
Add a new plan node DynamicFilterSourceNode for DF collection in build source stage.
Add a new optimizer rule AddDynamicFilterSource near the end in PlanOptimizers which matches for a join with DFs with remote exchange as right child and adds new plan node below the exchange.
Changes to LocalExecutionPlanner to add DynamicFilterSourceOperator based on DynamicFilterSourceNode.
Change to DynamicFilterSourceOperator to short-circuit dynamic filter collection in source stage for subsequent splits if the initial splits already exceeded collection thresholds.
Changes to dynamic filter collection on worker node to keep "final" dynamic filter for collection after successful completion of task.
Changes to DynamicFiltersCollector to track whether the current version of dynamic filter collected is the "final" one. This allows HttpRemoteTask on the coordinator to figure out whether DynamicFiltersFetcher needs to the fetch the collected DF from the worker after task completion.
Changes to DynamicFilterService to make it aware about task retry mode.
All the changes implement DF collection for task level retry mode while keeping existing pipelined mode execution as-is.
new feature
core query engine
Implements support for dynamic filtering in task retry mode of execution
Related issues, pull requests, and links
fixes #9935
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: