You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Programmatically figure out which class is the minority
Downsample the majority classes by randomly selecting count of minority class * max_imbalance number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that one
Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels
The text was updated successfully, but these errors were encountered:
aiqc
changed the title
Downsampling majority classes in order to balance categorical labels prior to split creation
Splitset.make(max_imbalance:float)
May 8, 2022
aiqc
changed the title
Splitset.make(max_imbalance:float)Splitset.make(max_imbalance:float)May 8, 2022
Background
Splitset.make
is where the sample indices that make up the splits are defined.Problem
When labels are not balanced, the network gets biases toward the majority classes and performs poorly on minority classes.
Need a way to downsample majority classes in order to balance categorical labels prior to split creation
Solution
max_imbalance:float
(greater than 1.0) forSplitset.make
Balance
section in this gist: https://gist.github.com/aiqc/d8d4b5e74a8811b3d8657c65cb3c6e7fcount of minority class * max_imbalance
number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that onenp.issubdtype
related tobin_count
for help determining categorical vs continuous labelsThe text was updated successfully, but these errors were encountered: