-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570
Comments
Hi, @nabilEM , Looks like you you have tried to use new API on old version of the library. There is a new version 0.9.3 that consists of a lot of changes related to BIRCH, I would strongly recommend to use it. Here is documentation with example: https://pyclustering.github.io/docs/0.9.3/html/d6/d00/classpyclustering_1_1cluster_1_1birch_1_1birch.html An example from the documentation: from pyclustering.cluster.birch import birch
from pyclustering.cluster import cluster_visualizer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import FAMOUS_SAMPLES
# Sample for cluster analysis (represented by list)
sample = read_sample(FAMOUS_SAMPLES.SAMPLE_OLD_FAITHFUL)
# Create BIRCH algorithm
birch_instance = birch(sample, 2, diameter=3.0)
# Cluster analysis
birch_instance.process()
# Obtain results of clustering
clusters = birch_instance.get_clusters()
# Visualize allocated clusters
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show() Regarding to:
I will think about it. But in this case it seems to me to use CF-tree directly is much more logical. Because BIRCH stores data in CF-tree (with re-scale if it is required) at the first phase and then apply hierarchical algorithm. Also you can use |
Thank you for your response. I finally installed pyclustering version 0.9.3. But when I ran the Birch algorithm, I got this error: lib\site-packages\pyclustering\container\cftree.py", line 878, in insert |
I installed the version 0.9.3.1 but I got the same error as in my first question of this issue: birch_instance.process() It seems that diameter_part variable is of type <class 'numpy.ndarray'>. In my case it contains this value when I ran Birch: [-484852. -467572. -540116. -463808. -526004. -506580. -466084. -532588. |
@nabilEM , I have just uploaded hotfix to pypi, you can upgrade it, but it helps only for the second problem. About the first one, I have to see your code, to understand the problem - could you please show how do you use the algorithm? And what kind of data do you use? |
Below my code. I used the pendigits data downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/pendigits/
|
@nabilEM , you have to convert
|
A very big thank you, it worked. Thank you again for your wonderful library! |
It would be interesting if you could add the point indices contained in each entry of the leaf nodes. This will allow users to directly manipulate the micro-clusters in addition to the aggregated calculations such as for example the linear sum. |
@nabilEM , There was such feature, but it was useless, you shouldn't rely on these indexes, because clustering results would be wrong. It is much better to calculate distance to CF-entries and to choose shortest (apply K-Means, X-Means or G-Means). This is the reason, why BIRCH performs cluster analysis at the end. |
@nabilEM , But if you need it, I can provide you a patch with these changes. |
@annoviko Thanks for your help. It would be interesting for me to know the reason that distorts the clustering resulting from the use of point indexes contained in the entries instead of the aggregated calculations of the entries (LS,SS). Perhaps the fact of not taking into account the points individually will lead to not correctly identifying the outliers. |
Hi, @nabilEM , If it is still relevant, here is a patch for '0.9.3.rel' branch with opportunity to get indexes from CF-entries: birch_instance.process()
cf_entries = birch_instance.get_cf_entries()
for entry in cf_entries:
print(entry.indexes) |
Thank you for your library, it is very useful for me and the data mining community. I wanted to run birch algorithm but I had this error from the cftree.py: if (merged_entry.get_diameter() > self.__threshold): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Also when I want to use the parameter diameter when I instantiate the birch algorithm, I get this error: birch_instance = birch(x,3,diameter=0.1)
TypeError: init() got an unexpected keyword argument 'diameter'.
One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?
The text was updated successfully, but these errors were encountered: