Skip to content

Conversation

@lajohn4747
Copy link
Contributor

CU-860pbbbng
resolves #286

Filter out columns from detection method that are not statistically modeled:
PII fields
Primary keys
Any other IDs
Text Data

@lajohn4747 lajohn4747 requested a review from a team as a code owner November 17, 2023 17:24
@sdv-team
Copy link
Contributor

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (87790f2) 77.12% compared to head (3dd9750) 77.30%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #525      +/-   ##
==========================================
+ Coverage   77.12%   77.30%   +0.18%     
==========================================
  Files          93       93              
  Lines        3488     3507      +19     
==========================================
+ Hits         2690     2711      +21     
+ Misses        798      796       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can do something like:

if sdtype not in [numerical, datetime, categorical]:
    drop_columns.append(column)

And:

if column in column_keys:
      drop_columns.append(column)

With column_keys a list of the primary, foreign and alternate keys

@lajohn4747 lajohn4747 marked this pull request as draft November 21, 2023 18:04
@lajohn4747 lajohn4747 marked this pull request as ready for review November 21, 2023 18:11
@lajohn4747
Copy link
Contributor Author

lajohn4747 commented Nov 21, 2023

if column in column_keys:
      drop_columns.append(column)

With column_keys a list of the primary, foreign and alternate keys

@R-Palazzo would the foreign keys and alternate keys be labeled in metadata?

@R-Palazzo
Copy link
Contributor

if column in column_keys:
      drop_columns.append(column)

With column_keys a list of the primary, foreign and alternate keys

@R-Palazzo would the foreign keys and alternate keys be labeled in metadata?

Yes if they exist you can access the alternate keys through metadata['alternate_keys'] and find the foreign key with the relationships

@lajohn4747 lajohn4747 requested a review from npatki November 22, 2023 00:02
Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just let one suggestion

@lajohn4747 lajohn4747 requested a review from R-Palazzo November 22, 2023 16:07
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lajohn4747 lajohn4747 merged commit fa50cec into main Nov 22, 2023
@lajohn4747 lajohn4747 deleted the issue-286-filter-keys branch November 22, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detection metrics should only use statistically modeled columns (filter out the rest)

6 participants