This repository is a research work on parallel dirichlet process mixture models and clustering on Julia by Ekin Akyürek with supervision of John W. Fischer III.
Demo:
gm = GridMixture(2)
X, clabels = rand_with_label(gm,100000)
fit(X; ncpu=3) # runs parallel split-merge algorithm
Visual Demo (requires OpenGL) :
gm = GridMixture(2)
X, clabels = rand_with_label(gm,100000)
scene = setup_scene(X)
fit(X; ncpu=3, scene=scene) # visualize parallel split-merge algorithm
For details please see the function documentation
- Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm) # serial collapsed
- Quasi-Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true) # quasi & serial collapsed
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true, ncpu=4) # quasi & parallel collapsed
- Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm) # direct
labels = fit(X; algorithm=DirectAlgorithm ncpu=4) # parallel direct
- Quasi-Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm, quasi=true) # quasi direct gibbs algorithm
labels = fit(X; algorithm=DirectAlgorithm, quasi=true, ncpu=4) # quasi & parallel direct gibbs direct gibbs
- Split-Merge Gibbs Sampler
labels = fit(X; algorithm=SplitMergeAlgorithm) # split-merge
labels = fit(X; algorithm=SplitMergeAlgorithm, ncpu=4) # parallel split-merge
Run below command:
julia --project test/parallel_benchmark.jl --N 1000000 --K 6 --Kinit 1 --ncpu 4
- Results-I: Time (sec) to run 100 DP-GMM iterations for d=2, N=1e6, K=6.
Code | ncpu=1 | ncpu=2 | ncpu=4 | ncpu=8 |
---|---|---|---|---|
C++ | 76.94 | 40.57 | 22.23 | 13.01 |
DPMM.jl | 75.71 | 41.54 | 20.86 | 12.77 |
Julia-BNP | 1101.97 | 572.50 | 345.58 | 172.30 |
- Results-II: Time (sec) to run 100 DP-MNMM iterations for d=100, N=1e6, K=6.
Code | ncpu=1 | ncpu=2 | ncpu=4 | ncpu=8 |
---|---|---|---|---|
C++ | 134.25 | 77.55 | 40.97 | 23.60 |
DPMM.jl | 113.131 | 68.46 | 45.55 | 30.79 |
Julia-BNP | 234.40 | 136.43 | 87.34 | 55.10 |