-
Notifications
You must be signed in to change notification settings - Fork 13
fix: Add keep_only_unique
parameter to filter_relationship_one_to_one
#67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@borchero @AndreasAlbertQC wdyt of this fix? One could argue that these functions are only to be used inside Happy to adjust the API if you have a better idea. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 41 41
Lines 2207 2210 +3
=========================================
+ Hits 2207 2210 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
drop_non_unique
parameter to filter_relationship_one_to_one
keep_only_unique
parameter to filter_relationship_one_to_one
keep_only_unique
parameter to filter_relationship_one_to_one
keep_only_unique
parameter to filter_relationship_one_to_one
@delsner I think your change is good, but I wonder if we need to do something else to mitigate confusing results if the dataframes are not unique on the join keys already. If I don't set the new flag, I will get nonsensical results, right?
I am hesitating between 1.3 and 2. |
Potentially, yes.
2 is easy, I can just update the docstring. Any thoughts @borchero? |
@delsner I'd be happy with counting option 1.3 as a bug fix bc I think the current behavior generates incorrect output data |
I made the current choice deliberately as additional uniqueness validations are really expensive. Hence, my initial thought would be to go for (2). Nevertheless, it potentially makes sense to make this behavior opt-in, i.e. require the user to set some kind of flag to skip the additional validation step. However, I'm not sure I love the extension of the |
@borchero and I decided to rename these functions to |
Motivation
filter_relationship_one_to_one
does only check for a 1:1 relationship in case both data frames are unique w.r.t. the provided join key. Also, the docstrings are incorrect as the join columns cannot be inferred.Changes
keep_only_unique
to allow filtering for 1:1 relationships in case join columns do not uniquely identify rows