You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Splitsets and foldsets divide the sample population into evenly distributed sets. However, in unbalanced datasets or highly folded datasets you can't always guarantee that infrequent classes will be present in each split/fold.
Problem
If a split or fold does not contain at least 1 of each class, it can cause downstream problems during encoding and when producing metrics. AIQC attempts to prevent these scenarios by encoding on all samples.
Solution
However, it would be good to warn the user that their splits/folds are not representative of the larger population. For Splitsets with a categorical label, warn the user at the end of Splitset or Foldset creation if any of their splits do not contains at least 1 of each class.
The text was updated successfully, but these errors were encountered:
Background
Splitsets and foldsets divide the sample population into evenly distributed sets. However, in unbalanced datasets or highly folded datasets you can't always guarantee that infrequent classes will be present in each split/fold.
Problem
If a split or fold does not contain at least 1 of each class, it can cause downstream problems during encoding and when producing metrics. AIQC attempts to prevent these scenarios by encoding on all samples.
Solution
However, it would be good to warn the user that their splits/folds are not representative of the larger population. For Splitsets with a categorical label, warn the user at the end of Splitset or Foldset creation if any of their splits do not contains at least 1 of each class.
The text was updated successfully, but these errors were encountered: