Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Add physical plan optimizer and optimization #2557

Merged
merged 13 commits into from
Jul 31, 2024

Conversation

Vince7778
Copy link
Contributor

Adds a physical plan optimizer to go along with the logical plan optimizer.

Also adds an optimization that repartitions are now more lenient. Previously, if a table was partitioned on col("a"), col("b"), it would be considered distinct from col("b"), col("a") and would run a repartition. Now, we walk down the plan tree and establish a canonical ordering of the columns, and when applicable we reorder columns in a repartition to conform to this. Then we remove redundant repartitions.

Also, hash joins are now more lenient with the repartitioning. Consider the case where the left side is correctly partitioned with 20 partitions, but the right side is incorrectly partitioned (relative to the join) with 40 partitions. Previously, the partition count would always be set to the maximum, meaning both sides would be repartitioned to have 40 partitions. Now, the right side is the only side repartitioned, with the partition count is reduced to 20. There is a configurable limit on how much the partitions can be reduced.

Aims to fix #2280
Supersedes #2543

Copy link

codecov bot commented Jul 24, 2024

Codecov Report

Attention: Patch coverage is 91.29815% with 61 lines in your changes missing coverage. Please review.

Project coverage is 63.60%. Comparing base (4df3333) to head (de003be).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2557      +/-   ##
==========================================
+ Coverage   63.27%   63.60%   +0.32%     
==========================================
  Files         980      986       +6     
  Lines      109238   109848     +610     
==========================================
+ Hits        69125    69865     +740     
+ Misses      40113    39983     -130     
Files Coverage Δ
daft/context.py 76.42% <ø> (ø)
src/common/daft-config/src/lib.rs 90.90% <100.00%> (+0.21%) ⬆️
src/daft-dsl/src/expr.rs 74.87% <100.00%> (+0.15%) ⬆️
src/daft-dsl/src/lib.rs 100.00% <ø> (ø)
src/daft-plan/src/lib.rs 100.00% <ø> (ø)
src/daft-plan/src/physical_ops/project.rs 78.49% <100.00%> (+3.18%) ⬆️
...rc/physical_optimization/rules/drop_repartition.rs 100.00% <100.00%> (ø)
.../daft-plan/src/physical_optimization/rules/rule.rs 100.00% <100.00%> (ø)
src/daft-plan/src/physical_plan.rs 61.00% <100.00%> (+11.12%) ⬆️
src/daft-plan/src/physical_planner/mod.rs 95.00% <100.00%> (+0.55%) ⬆️
... and 5 more

... and 30 files with indirect coverage changes

@Vince7778 Vince7778 requested a review from samster25 July 24, 2024 20:13
@Vince7778 Vince7778 merged commit a75a72c into main Jul 31, 2024
44 checks passed
@Vince7778 Vince7778 deleted the conor/physical-plan-optimizer branch July 31, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PERF] Join key reordering optimization rule
2 participants