Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix comparison report when one column is all NAs #343

Merged
merged 6 commits into from
Oct 29, 2024

Conversation

yozzo
Copy link
Contributor

@yozzo yozzo commented Oct 25, 2024

With situations when one of the column in the comparison was all NA's then this would break the reporting. For some reason when the matching (boolean) of the actual and expected columns happened when there was a categorical value compared to a NA, the result was NA, rather than False, as it would happen for the other elements in the cols compared.

This has now been addressed at the intersect rows level which doesn't seem to break the reporting anymore.

With situations when one of the column in the comparison was
all NA's then this would break the reporting. For some reason
when the matching (boolean) of the actual and expected columns happened when there was a categorical value compared to a NA, the result was NA, rather than False, as it would happen for the other elements in the cols compared.
    
This has now been addressed at the intersect rows level which doesn't seem to break the reporting anymore.
@yozzo yozzo changed the title Fix compariosn report when one column is all NAs Fix comparison report when one column is all NAs Oct 25, 2024
Copy link
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good. I'm wondering though if we should fillna(False) permanently on self.intersect_rows[column + "_match"] and store it.

yozzo added 2 commits October 28, 2024 17:05
Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays
@yozzo yozzo marked this pull request as ready for review October 28, 2024 17:09
@yozzo
Copy link
Contributor Author

yozzo commented Oct 28, 2024

Generally this looks good. I'm wondering though if we should fillna(False) permanently on self.intersect_rows[column + "_match"] and store it.

tried a different implementation that fixes it at the source of the issue

yozzo added 3 commits October 29, 2024 09:15
Printing out the report would've been useful for this test, but looks like it makes the linter fail the build. This has now been fixed.
Fix column_equal to work with StringArrays with pd.NA values not returning booleans, and update formatting to match the linter expectation
Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays, and format test to match the linter.
Copy link
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix here.

@fdosani fdosani merged commit 1013ca8 into capitalone:develop Oct 29, 2024
48 checks passed
@fdosani fdosani mentioned this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants