Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix join errors with same key name joins (resolves #2649) #2877

Merged

Conversation

anmolsingh20
Copy link
Contributor

The issue fixed here had a workaround previously - aliasing the duplicate column name. This is not needed anymore as the aliasing is performed under the hood, taking care of uniqueness of individual column keys to avoid the duplicate issue.

@github-actions github-actions bot added the bug Something isn't working label Sep 21, 2024
Copy link

codspeed-hq bot commented Sep 21, 2024

CodSpeed Performance Report

Merging #2877 will not alter performance

Comparing anmolsingh20:fix_same_key_name_join_errors_2649 (1f661b4) with main (fe4553f)

Summary

✅ 17 untouched benchmarks

@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch 3 times, most recently from 3fd90e2 to 569d611 Compare September 21, 2024 07:00
@samster25
Copy link
Member

Just assigned @kevinzwang and @colin-ho for the review!

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! I have some comments but overall looks like a great solution to the problem.

src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch 2 times, most recently from 53aa100 to 239e7b2 Compare September 26, 2024 02:08
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty happy with the new solution! Just a few small comments and it should be ready to merge!

src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
src/daft-plan/src/logical_ops/join.rs Outdated Show resolved Hide resolved
@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch from 239e7b2 to a431fb9 Compare September 27, 2024 16:37
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch from a431fb9 to 111995e Compare September 27, 2024 17:08
Copy link

codecov bot commented Oct 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.65%. Comparing base (dba931f) to head (6dd44b7).
Report is 53 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2877      +/-   ##
==========================================
+ Coverage   65.98%   66.65%   +0.67%     
==========================================
  Files        1008     1011       +3     
  Lines      113417   115576    +2159     
==========================================
+ Hits        74836    77037    +2201     
+ Misses      38581    38539      -42     
Files with missing lines Coverage Δ
src/daft-plan/src/logical_ops/join.rs 93.14% <100.00%> (+1.59%) ⬆️

... and 270 files with indirect coverage changes

@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch from 6dd44b7 to 653b43c Compare October 3, 2024 02:41
@kevinzwang kevinzwang self-requested a review October 3, 2024 04:29
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one final comment

Comment on lines 224 to 232
(Expr::Column(_), Expr::Column(_)) if left_expr != right_expr => {
let unique_id = Uuid::new_v4().to_string();
let renamed_left_expr =
left_expr.alias(format!("{}_{}", left_expr.name(), unique_id));
let renamed_right_expr =
right_expr.alias(format!("{}_{}", right_expr.name(), unique_id));
(renamed_left_expr, renamed_right_expr)
}
_ => (left_expr, right_expr),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment: If the join keys are not column expressions, we may still want to rename them. The only time when we wouldn't want to rename is when they are both equal and column expressions. So I think the match arms would look something like this instead:

(Expr::Column(l_name), Expr::Column(r_name)) if l_name == r_name => (left_expr, right_expr),
_ => {
     // the stuff you have here
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, made a quick update!

Rename join keys except when they share the same key name
and are Column expression types; include original expression name
in the renamed expression.
@anmolsingh20 anmolsingh20 force-pushed the fix_same_key_name_join_errors_2649 branch from 653b43c to 1f661b4 Compare October 3, 2024 06:27
@kevinzwang
Copy link
Member

Looks good to me! @anmolsingh20 thanks again for taking your time to add this

@kevinzwang kevinzwang merged commit 62d0581 into Eventual-Inc:main Oct 3, 2024
36 checks passed
sagiahrac pushed a commit to sagiahrac/Daft that referenced this pull request Oct 7, 2024
…#2649) (Eventual-Inc#2877)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing is
performed under the hood, taking care of uniqueness of individual column
keys to avoid the duplicate issue.

---------

Co-authored-by: AnmolS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants