-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kmedoids returns empty cluster lists for version 0.10.1 #659
Comments
Hello @laurenleesc , There were changes regarded to K-Medoids, it was aligned with the following paper: I have reviewed the code and found the problem in C++ implementation (that is used by default). I will release 0.10.1.1 as soon as possible with the hotfix. Thanks you for the reporting, I am really appreciate that. As a workaround, you can use option |
Hi @laurenleesc , I have just released the library (version |
Unfortunately, it is still an issue. I have attached the code and the simulated dataset I used... Thank you very much! Lauren |
Thank for the collaboration! I will investigate the issue on your dataset. |
Hello @laurenleesc , I have corrected the issue. The correction is available in In case of your code the behavior is going to the following: import pandas as pd
import numpy as np
from pyclustering.cluster.kmedoids import kmedoids
import warnings
warnings.filterwarnings("ignore")
import nltk
def split(word):
return [char for char in word]
def dist_matrix(data):
for k in range(0,len(data)):
#print(split(data[k]))
for m in range(0,len(data)):
Matrix[m,k]=nltk.jaccard_distance(set(split(data[m])),set(split(data[k])))
klist=[2,3,4,5,6,7,8,9,10,11,12,13,14,15]
df = pd.read_csv('df_train_sim13_exp0.csv')
df['true_index'] = df.index
df2 = df['String'].values
l = (len(df), len(df))
Matrix = np.zeros(l, dtype=np.float)
dist_matrix(df2)
# print(Matrix)
np.save('jaccard_exp0_simulated_set13.npy', Matrix)
data = np.load('jaccard_exp0_simulated_set13.npy')
for k in klist:
initial_medoids=list(range(0,k))
kmedoids_instance=kmedoids(data,initial_medoids,data_type='distance_matrix')
kmedoids_instance.process()
clusters=kmedoids_instance.get_clusters()
medoids=kmedoids_instance.get_medoids()
print("- Data length: %d" % len(data))
print("- Amount clusters: %d" % len(clusters))
print(clusters) Output is the following:
|
Thanks! This works great. I've attempted to read your documentation on the k-medoids but would you mind mentioning the paper you're referencing? |
@laurenleesc , all algorithms are followed by references to corresponding papers. You have to check namespace (probably I should duplicate it for classes as well). There is an example from the current documentation: Make sure that you are reading the latest documentation: https://pyclustering.github.io/docs/0.10.1/html/ |
Hi,
Previously, code working on one server with version 0.9.3.1 worked as expected. However, the same code run on a different server with version 0.10.1 returned some empty clusters for the same dataset and initial medoids.
initial_medoids=[0,1,2,3]
kmedoids_instance=kmedoids(df2,initial_medoids,metric=metric)
kmedoids_instance.process()
clusters=kmedoids_instance.get_clusters()
medoids=kmedoids_instance.get_medoids()
print(clusters)
The above would return indices for clusters 0 and 1 but empty lists for clusters 2 and 3, despite there not being any missing in my data df2. I would expect at the very least, the medoids themselves to be in clusters 2 and 3.
Thank you, this is a great package, I really appreciate it.
Lauren
The text was updated successfully, but these errors were encountered: