Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitset.make(max_imbalance:float) #106

Open
aiqc opened this issue May 8, 2022 · 1 comment
Open

Splitset.make(max_imbalance:float) #106

aiqc opened this issue May 8, 2022 · 1 comment

Comments

@aiqc
Copy link
Owner

aiqc commented May 8, 2022

Background

Splitset.make is where the sample indices that make up the splits are defined.

Problem

When labels are not balanced, the network gets biases toward the majority classes and performs poorly on minority classes.

Need a way to downsample majority classes in order to balance categorical labels prior to split creation

Solution

  • New argument max_imbalance:float (greater than 1.0) for Splitset.make
  • Reference the Balance section in this gist: https://gist.github.com/aiqc/d8d4b5e74a8811b3d8657c65cb3c6e7f
  • Programmatically figure out which class is the minority
  • Downsample the majority classes by randomly selecting count of minority class * max_imbalance number of samples from those classes. Reset the index of labels and features if necessary. Throw out the indices that are not in that one
  • Reference the lines surrounding np.issubdtype related to bin_count for help determining categorical vs continuous labels
@aiqc aiqc self-assigned this May 8, 2022
@aiqc
Copy link
Owner Author

aiqc commented May 8, 2022

Separate follow up issue would be using bins to downsample continuous labels

@aiqc aiqc removed their assignment May 8, 2022
@aiqc aiqc changed the title Downsampling majority classes in order to balance categorical labels prior to split creation Splitset.make(max_imbalance:float) May 8, 2022
@aiqc aiqc changed the title Splitset.make(max_imbalance:float) Splitset.make(max_imbalance:float) May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant