You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MODL is quadratic in the sample count, so increasing by a factor of 10 theoretically increases runtime by a factor of 100. In the paper we used MODL on a dataset with 1372 samples, so it must be possible, it just takes a while. On top of this, if using more samples causes you to use swap memory, the program is simply going to be super slow.
Optimal binning is inherently expensive. Perhaps suboptimal binning is sufficient for your application? You could base your bin edges on a subsampling of the data.
It is entirely possible that the algorithm could be more effectively implemented in Julia. Feel free to contribute a PR!
if you use array size of 100 elements, it does work. But if you use array size of 1000 elements, it never complete.
this will work
data = [randn(100); randn(100)]
labels = [fill(:cat, 100); fill(:dog, 100)]
integer_labels = encode(CategoricalDiscretizer([:cat, :dog]), labels)
edges = binedges(DiscretizeMODL_Optimal(), data, integer_labels)
this will not work
data = [randn(1000); randn(1000)]
labels = [fill(:cat, 1000); fill(:dog, 1000)]
integer_labels = encode(CategoricalDiscretizer([:cat, :dog]), labels)
edges = binedges(DiscretizeMODL_Optimal(), data, integer_labels)
Is there maximum array size for MODL Optimal Supervised Binning?
Thank you!
The text was updated successfully, but these errors were encountered: