Skip to content

Oversampling modules return a truncated array in the multi-class instance #489

@samhardyhey

Description

@samhardyhey

Description

Oversampling modules sometimes return a truncated array in the multi-class instance. Apologies if this is a user error. Below example feeds in a multi-label matrix; unsure if this has implications for the algorithm (if so feel free to correct my understanding! :)).

Steps/Code to Reproduce

from imblearn.over_sampling import BorderlineSMOTE

bl = BorderlineSMOTE(random_state=0, n_jobs=8,k_neighbors=1)

x=np.random.randint(5, size=5000).reshape(1000,5)
y=np.random.randint(2, size=10000).reshape(1000,10)

#bl
bl_x, bl_y = bl.fit_resample(x,y)
bl_y.shape

Expected Results

Some array which features the same number of columns as the input.

(1000, 10)

Actual Results

Randomly truncates one of the columns during calls to fit_resample and fit_sample. Have toggled the cell in my notebook in sequence to discern a pattern; there is none. Result randomly appears in 1/4 results (ish). Even after controlling for the random state in the instance creation.

(1000, 9)

Versions

Linux-4.4.0-134-generic-x86_64-with-debian-stretch-sid
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
NumPy 1.15.2
SciPy 1.1.0
Scikit-Learn 0.20.0
Imbalanced-Learn 0.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions