-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
Oversampling modules sometimes return a truncated array in the multi-class instance. Apologies if this is a user error. Below example feeds in a multi-label matrix; unsure if this has implications for the algorithm (if so feel free to correct my understanding! :)).
Steps/Code to Reproduce
from imblearn.over_sampling import BorderlineSMOTE
bl = BorderlineSMOTE(random_state=0, n_jobs=8,k_neighbors=1)
x=np.random.randint(5, size=5000).reshape(1000,5)
y=np.random.randint(2, size=10000).reshape(1000,10)
#bl
bl_x, bl_y = bl.fit_resample(x,y)
bl_y.shape
Expected Results
Some array which features the same number of columns as the input.
(1000, 10)
Actual Results
Randomly truncates one of the columns during calls to fit_resample and fit_sample. Have toggled the cell in my notebook in sequence to discern a pattern; there is none. Result randomly appears in 1/4 results (ish). Even after controlling for the random state in the instance creation.
(1000, 9)
Versions
Linux-4.4.0-134-generic-x86_64-with-debian-stretch-sid
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
NumPy 1.15.2
SciPy 1.1.0
Scikit-Learn 0.20.0
Imbalanced-Learn 0.4.1