`Splitset.make(max_imbalance:float)` #106

aiqc · 2022-05-08T15:31:03Z

Background

Splitset.make is where the sample indices that make up the splits are defined.

Problem

When labels are not balanced, the network gets biases toward the majority classes and performs poorly on minority classes.

Need a way to downsample majority classes in order to balance categorical labels prior to split creation

Solution

New argument max_imbalance:float (greater than 1.0) for Splitset.make
Reference the Balance section in this gist: https://gist.github.com/aiqc/d8d4b5e74a8811b3d8657c65cb3c6e7f
Programmatically figure out which class is the minority
Downsample the majority classes by randomly selecting count of minority class * max_imbalance number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that one
Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels

The text was updated successfully, but these errors were encountered:

aiqc · 2022-05-08T15:42:15Z

Separate follow up issue would be using bins to downsample continuous labels

aiqc self-assigned this May 8, 2022

aiqc removed their assignment May 8, 2022

aiqc changed the title ~~Downsampling majority classes in order to balance categorical labels prior to split creation~~ Splitset.make(max_imbalance:float) May 8, 2022

aiqc changed the title ~~Splitset.make(max_imbalance:float)~~ Splitset.make(max_imbalance:float) May 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Splitset.make(max_imbalance:float)` #106

`Splitset.make(max_imbalance:float)` #106

aiqc commented May 8, 2022 •

edited

Loading

aiqc commented May 8, 2022

Splitset.make(max_imbalance:float) #106

Splitset.make(max_imbalance:float) #106

Comments

aiqc commented May 8, 2022 • edited Loading

Background

Problem

Solution

aiqc commented May 8, 2022

`Splitset.make(max_imbalance:float)` #106

`Splitset.make(max_imbalance:float)` #106

aiqc commented May 8, 2022 •

edited

Loading