-
Notifications
You must be signed in to change notification settings - Fork 50
Filter out keys that cannot be statistically modeled #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #525 +/- ##
==========================================
+ Coverage 77.12% 77.30% +0.18%
==========================================
Files 93 93
Lines 3488 3507 +19
==========================================
+ Hits 2690 2711 +21
+ Misses 798 796 -2 ☔ View full report in Codecov by Sentry. |
R-Palazzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can do something like:
if sdtype not in [numerical, datetime, categorical]:
drop_columns.append(column)
And:
if column in column_keys:
drop_columns.append(column)
With column_keys a list of the primary, foreign and alternate keys
@R-Palazzo would the foreign keys and alternate keys be labeled in metadata? |
Yes if they exist you can access the alternate keys through |
R-Palazzo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just let one suggestion
amontanez24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
CU-860pbbbng
resolves #286
Filter out columns from detection method that are not statistically modeled:
PII fields
Primary keys
Any other IDs
Text Data