Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning when split or fold does not contain at least 1 of each class #73

Open
aiqc opened this issue Apr 15, 2021 · 0 comments
Open

Warning when split or fold does not contain at least 1 of each class #73

aiqc opened this issue Apr 15, 2021 · 0 comments

Comments

@aiqc
Copy link
Owner

aiqc commented Apr 15, 2021

Background

Splitsets and foldsets divide the sample population into evenly distributed sets. However, in unbalanced datasets or highly folded datasets you can't always guarantee that infrequent classes will be present in each split/fold.

Problem

If a split or fold does not contain at least 1 of each class, it can cause downstream problems during encoding and when producing metrics. AIQC attempts to prevent these scenarios by encoding on all samples.

Solution

However, it would be good to warn the user that their splits/folds are not representative of the larger population. For Splitsets with a categorical label, warn the user at the end of Splitset or Foldset creation if any of their splits do not contains at least 1 of each class.

@aiqc aiqc assigned aiqc and unassigned aiqc Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant