Skip to content

Conversation

kevinzwang
Copy link
Member

Changes Made

continuation of the refactor, this time moving column binding above the daft-local-execution and daft-micropartition crates.

After this, we can start working on binding in the logical plan!!

Related Issues

#4270

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Copy link

codecov bot commented May 24, 2025

Codecov Report

Attention: Patch coverage is 90.97938% with 70 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@b606d6c). Learn more about missing BASE report.
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-recordbatch/src/python.rs 3.84% 25 Missing ⚠️
src/daft-logical-plan/src/sink_info.rs 75.00% 14 Missing ⚠️
src/daft-dsl/src/expr/bound_expr.rs 74.46% 12 Missing ⚠️
...ecution/src/intermediate_ops/actor_pool_project.rs 73.33% 4 Missing ⚠️
src/daft-micropartition/src/python.rs 89.28% 3 Missing ⚠️
...ft-physical-plan/src/physical_planner/translate.rs 98.40% 3 Missing ⚠️
daft/sql/sql_scan.py 0.00% 2 Missing ⚠️
src/daft-micropartition/src/ops/join.rs 77.77% 2 Missing ⚠️
daft/recordbatch/recordbatch_io.py 50.00% 1 Missing ⚠️
...ft-local-execution/src/intermediate_ops/project.rs 98.73% 1 Missing ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4425   +/-   ##
=======================================
  Coverage        ?   77.21%           
=======================================
  Files           ?      844           
  Lines           ?   112779           
  Branches        ?        0           
=======================================
  Hits            ?    87081           
  Misses          ?    25698           
  Partials        ?        0           
Files with missing lines Coverage Δ
daft/execution/execution_step.py 87.27% <100.00%> (ø)
daft/execution/physical_plan.py 93.69% <ø> (ø)
daft/execution/rust_physical_plan_shim.py 93.81% <ø> (ø)
src/daft-dsl/src/functions/python/mod.rs 96.44% <100.00%> (ø)
src/daft-local-execution/src/dispatcher.rs 97.70% <100.00%> (ø)
...ft-local-execution/src/intermediate_ops/explode.rs 76.92% <100.00%> (ø)
...aft-local-execution/src/intermediate_ops/filter.rs 81.25% <100.00%> (ø)
...tion/src/intermediate_ops/inner_hash_join_probe.rs 90.07% <100.00%> (ø)
...ft-local-execution/src/intermediate_ops/unpivot.rs 68.18% <100.00%> (ø)
src/daft-local-execution/src/sinks/aggregate.rs 82.35% <100.00%> (ø)
... and 43 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinzwang kevinzwang requested review from srilman and colin-ho May 27, 2025 18:29
$expr,
)?
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love seeing this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can port this to the existing one as well? Or out of scope?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually scratch that, why keep the original function? It looks like it's only used in src/daft-local-execution/src/sinks/aggregate.rs and src/daft-local-execution/src/sinks/grouped_aggregate.rs, both of which are updated in this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original function is still used in the legacy (py+ray runner) physical plan translation. I'll make a subsequent PR (should be a small one) to use bound columns in the legacy physical plan too, and once I do that, the original function can be removed.

Copy link
Contributor

@srilman srilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Leaving the approval to Colin just in case, but have 1 question

$expr,
)?
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can port this to the existing one as well? Or out of scope?

$expr,
)?
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually scratch that, why keep the original function? It looks like it's only used in src/daft-local-execution/src/sinks/aggregate.rs and src/daft-local-execution/src/sinks/grouped_aggregate.rs, both of which are updated in this PR

pub root_dir: String,
pub write_mode: WriteMode,
pub file_format: FileFormat,
pub partition_cols: Option<Vec<ExprRef>>,
pub partition_cols: Option<Vec<E>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bind in the logical plan does that mean this can become Option<Vec> and we don't need the generics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly. Once we finish the refactor, we should only allow this to hold a BoundExpr

Comment on lines 40 to 47
fn pyexpr_to_bound(&self, expr: PyExpr) -> DaftResult<BoundExpr> {
BoundExpr::try_new(expr.into(), &self.inner.schema)
}

fn pyexprs_to_bound(&self, exprs: Vec<PyExpr>) -> DaftResult<Vec<BoundExpr>> {
exprs.into_iter().map(|e| self.pyexpr_to_bound(e)).collect()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems weird that these are methods on pymicropartition, could you just make the bind_all method take in into ExprRef ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, changed it to that

@kevinzwang kevinzwang enabled auto-merge (squash) May 29, 2025 07:49
@kevinzwang kevinzwang merged commit 98d559f into main May 29, 2025
47 checks passed
@kevinzwang kevinzwang deleted the kevin/bound-execution branch May 29, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants