Fix comparison report when one column is all NAs #343

yozzo · 2024-10-25T11:14:00Z

With situations when one of the column in the comparison was all NA's then this would break the reporting. For some reason when the matching (boolean) of the actual and expected columns happened when there was a categorical value compared to a NA, the result was NA, rather than False, as it would happen for the other elements in the cols compared.

This has now been addressed at the intersect rows level which doesn't seem to break the reporting anymore.

With situations when one of the column in the comparison was all NA's then this would break the reporting. For some reason when the matching (boolean) of the actual and expected columns happened when there was a categorical value compared to a NA, the result was NA, rather than False, as it would happen for the other elements in the cols compared. This has now been addressed at the intersect rows level which doesn't seem to break the reporting anymore.

fdosani

Generally this looks good. I'm wondering though if we should fillna(False) permanently on self.intersect_rows[column + "_match"] and store it.

…rning booleans

Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays

yozzo · 2024-10-28T17:10:08Z

Generally this looks good. I'm wondering though if we should fillna(False) permanently on self.intersect_rows[column + "_match"] and store it.

tried a different implementation that fixes it at the source of the issue

Printing out the report would've been useful for this test, but looks like it makes the linter fail the build. This has now been fixed.

Fix column_equal to work with StringArrays with pd.NA values not returning booleans, and update formatting to match the linter expectation

Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays, and format test to match the linter.

fdosani

LGTM. Thanks for the fix here.

yozzo changed the title ~~Fix compariosn report when one column is all NAs~~ Fix comparison report when one column is all NAs Oct 25, 2024

fdosani reviewed Oct 25, 2024

View reviewed changes

yozzo added 2 commits October 28, 2024 17:05

Fix column_equal to work with StringArrays with pd.NA values not retu…

b70f779

…rning booleans

Add test for fn column_equal to work with StringArrays with pd.NA

6818665

Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays

yozzo marked this pull request as ready for review October 28, 2024 17:09

yozzo requested review from ak-gupta, jdawang and gladysteh99 as code owners October 28, 2024 17:09

yozzo added 3 commits October 29, 2024 09:15

Fix linter error

2120a64

Printing out the report would've been useful for this test, but looks like it makes the linter fail the build. This has now been fixed.

Fix column_equal to work with StringArrays with pd.NA values

0b59256

Fix column_equal to work with StringArrays with pd.NA values not returning booleans, and update formatting to match the linter expectation

Add test for fn column_equal to work with StringArrays with pd.NA

3398988

Add test for fn column_equal to work with StringArrays containing pd.NA values not returning booleans when compared with other df's with rows of StringArrays, and format test to match the linter.

fdosani approved these changes Oct 29, 2024

View reviewed changes

fdosani merged commit 1013ca8 into capitalone:develop Oct 29, 2024
48 checks passed

fdosani mentioned this pull request Oct 30, 2024

Release v0.14.2 #345

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix comparison report when one column is all NAs #343

Fix comparison report when one column is all NAs #343

yozzo commented Oct 25, 2024

fdosani left a comment

yozzo commented Oct 28, 2024

fdosani left a comment

Fix comparison report when one column is all NAs #343

Fix comparison report when one column is all NAs #343

Conversation

yozzo commented Oct 25, 2024

fdosani left a comment

Choose a reason for hiding this comment

yozzo commented Oct 28, 2024

fdosani left a comment

Choose a reason for hiding this comment