MODL Optimal Binning Does not work with large arrays? #28

Sinansi · 2020-05-15T08:48:41Z

if you use array size of 100 elements, it does work. But if you use array size of 1000 elements, it never complete.

this will work

data = [randn(100); randn(100)]
labels = [fill(:cat, 100); fill(:dog, 100)]
integer_labels = encode(CategoricalDiscretizer([:cat, :dog]), labels)
edges = binedges(DiscretizeMODL_Optimal(), data, integer_labels)

this will not work

data = [randn(1000); randn(1000)]
labels = [fill(:cat, 1000); fill(:dog, 1000)]
integer_labels = encode(CategoricalDiscretizer([:cat, :dog]), labels)
edges = binedges(DiscretizeMODL_Optimal(), data, integer_labels)

Is there maximum array size for MODL Optimal Supervised Binning?

Thank you!

tawheeler · 2020-05-15T15:27:28Z

MODL is quadratic in the sample count, so increasing by a factor of 10 theoretically increases runtime by a factor of 100. In the paper we used MODL on a dataset with 1372 samples, so it must be possible, it just takes a while. On top of this, if using more samples causes you to use swap memory, the program is simply going to be super slow.

Optimal binning is inherently expensive. Perhaps suboptimal binning is sufficient for your application? You could base your bin edges on a subsampling of the data.

It is entirely possible that the algorithm could be more effectively implemented in Julia. Feel free to contribute a PR!

Sinansi · 2020-05-17T20:26:18Z

@tawheeler Thanks for your reply. I applied optimal binning by subsampling and it worked well for my case. All good :)

Sinansi closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODL Optimal Binning Does not work with large arrays? #28

MODL Optimal Binning Does not work with large arrays? #28

Sinansi commented May 15, 2020

tawheeler commented May 15, 2020 •

edited

Loading

Sinansi commented May 17, 2020

MODL Optimal Binning Does not work with large arrays? #28

MODL Optimal Binning Does not work with large arrays? #28

Comments

Sinansi commented May 15, 2020

tawheeler commented May 15, 2020 • edited Loading

Sinansi commented May 17, 2020

tawheeler commented May 15, 2020 •

edited

Loading