ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570

nabilEM · 2019-12-22T22:12:12Z

Thank you for your library, it is very useful for me and the data mining community. I wanted to run birch algorithm but I had this error from the cftree.py: if (merged_entry.get_diameter() > self.__threshold): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Also when I want to use the parameter diameter when I instantiate the birch algorithm, I get this error: birch_instance = birch(x,3,diameter=0.1)
TypeError: init() got an unexpected keyword argument 'diameter'.

One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?

annoviko · 2019-12-23T10:31:48Z

Hi, @nabilEM ,

Looks like you you have tried to use new API on old version of the library.

There is a new version 0.9.3 that consists of a lot of changes related to BIRCH, I would strongly recommend to use it. Here is documentation with example: https://pyclustering.github.io/docs/0.9.3/html/d6/d00/classpyclustering_1_1cluster_1_1birch_1_1birch.html

An example from the documentation:

from pyclustering.cluster.birch import birch
from pyclustering.cluster import cluster_visualizer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import FAMOUS_SAMPLES

# Sample for cluster analysis (represented by list)
sample = read_sample(FAMOUS_SAMPLES.SAMPLE_OLD_FAITHFUL)

# Create BIRCH algorithm
birch_instance = birch(sample, 2, diameter=3.0)

# Cluster analysis
birch_instance.process()

# Obtain results of clustering
clusters = birch_instance.get_clusters()

# Visualize allocated clusters
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Regarding to:

One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?

I will think about it. But in this case it seems to me to use CF-tree directly is much more logical. Because BIRCH stores data in CF-tree (with re-scale if it is required) at the first phase and then apply hierarchical algorithm. Also you can use get_cf_entries() method to get all CF-entries to cluster them by another algorithm. If you need an example how to apply another algorithm for CF-entries, I can provide it.

nabilEM · 2019-12-23T14:36:29Z

Thank you for your response. I finally installed pyclustering version 0.9.3. But when I ran the Birch algorithm, I got this error: lib\site-packages\pyclustering\container\cftree.py", line 878, in insert
node = leaf_node(entry, None, [entry], None)
TypeError: init() takes 4 positional arguments but 5 were given

nabilEM · 2019-12-23T16:02:08Z

I installed the version 0.9.3.1 but I got the same error as in my first question of this issue: birch_instance.process()
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\cluster\birch.py", line 160, in process
self.__insert_data()
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\cluster\birch.py", line 279, in __insert_data
self.__tree.insert_point(point)
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\container\cftree.py", line 866, in insert_point
self.insert(entry)
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\container\cftree.py", line 888, in insert
child_node_updation = self.__recursive_insert(entry, self.__root)
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\container\cftree.py", line 938, in __recursive_insert
return self.__insert_for_leaf_node(entry, search_node)
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\container\cftree.py", line 960, in __insert_for_leaf_node
if merged_entry.get_diameter() > self.__threshold:
File "C:\ProgramData\Anaconda3\envs\ha\lib\site-packages\pyclustering\container\cftree.py", line 292, in get_diameter
if diameter_part < 0.000000001:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

It seems that diameter_part variable is of type <class 'numpy.ndarray'>. In my case it contains this value when I ran Birch: [-484852. -467572. -540116. -463808. -526004. -506580. -466084. -532588.
-514772. -541428. -541428. -537008. -527316. -509644. -488884. -463012.]

annoviko · 2019-12-23T16:09:08Z

@nabilEM , I have just uploaded hotfix to pypi, you can upgrade it, but it helps only for the second problem. About the first one, I have to see your code, to understand the problem - could you please show how do you use the algorithm? And what kind of data do you use?

nabilEM · 2019-12-23T16:18:57Z

Below my code. I used the pendigits data downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/pendigits/

from pyclustering.cluster.birch import birch
import numpy as np
import pandas as ps

def load_data():
    data1=ps.read_csv("datasets/pendigits.tes",sep=",",header=None)
    data2=ps.read_csv("datasets/pendigits.tra",sep=",",header=None)
    data=ps.concat([data1,data2])
    #print(data)
    labels = data.iloc[:,-1]
    data.drop(data.columns[len(data.columns)-1], axis=1, inplace=True)
    x=np.array(data)
    y=np.array(labels)
    return x,y
x,y=load_data()
# Create BIRCH algorithm
birch_instance = birch(x,3,diameter=0.1)
# Cluster analysis
birch_instance.process()

# Obtain results of clustering
clusters = birch_instance.get_clusters()

# Obtain information how does the 'Lsun' sample is encoded in the CF-tree.
cf_entries = birch_instance.get_cf_entries()
cf_clusters = birch_instance.get_cf_cluster()

cf_centroids = [entry.get_centroid() for entry in cf_entries]

# Visualize allocated clusters
visualizer = cluster_visualizer(2, 2, titles=["Encoded data by CF-entries", "Data clusters"])
visualizer.append_clusters(cf_clusters, cf_centroids, canvas=0)
visualizer.append_clusters(clusters, sample, canvas=1)
visualizer.show()

annoviko · 2019-12-23T16:27:00Z

@nabilEM , you have to convert x to list, numpy.array is not supported for BIRCH.

[in] data (list): An input data represented as a list of points (objects) where each point is be represented by list of coordinates.

nabilEM · 2019-12-23T18:06:18Z

A very big thank you, it worked. Thank you again for your wonderful library!

nabilEM · 2019-12-23T18:13:02Z

It would be interesting if you could add the point indices contained in each entry of the leaf nodes. This will allow users to directly manipulate the micro-clusters in addition to the aggregated calculations such as for example the linear sum.

annoviko · 2019-12-24T08:22:06Z

@nabilEM ,

There was such feature, but it was useless, you shouldn't rely on these indexes, because clustering results would be wrong. It is much better to calculate distance to CF-entries and to choose shortest (apply K-Means, X-Means or G-Means). This is the reason, why BIRCH performs cluster analysis at the end.

annoviko · 2019-12-24T08:49:35Z

@nabilEM ,

But if you need it, I can provide you a patch with these changes.

nabilEM · 2019-12-24T09:40:11Z

@annoviko Thanks for your help. It would be interesting for me to know the reason that distorts the clustering resulting from the use of point indexes contained in the entries instead of the aggregated calculations of the entries (LS,SS). Perhaps the fact of not taking into account the points individually will lead to not correctly identifying the outliers.

annoviko · 2020-01-08T13:35:20Z

Hi, @nabilEM ,

If it is still relevant, here is a patch for '0.9.3.rel' branch with opportunity to get indexes from CF-entries:

birch_instance.process()
cf_entries = birch_instance.get_cf_entries()

for entry in cf_entries:
    print(entry.indexes)

cf_tree_index_patch_for_private_usage.zip

nabilEM · 2020-01-20T19:48:54Z

Hi, @nabilEM ,

If it is still relevant, here is a patch for '0.9.3.rel' branch with opportunity to get indexes from CF-entries:
birch_instance.process()
cf_entries = birch_instance.get_cf_entries()

for entry in cf_entries:
    print(entry.indexes)
cf_tree_index_patch_for_private_usage.zip

Thank you @annoviko !

annoviko self-assigned this Dec 23, 2019

annoviko added the Question Tasks that are questions from users label Dec 23, 2019

annoviko added a commit that referenced this issue Dec 23, 2019

#570: Hotfix for CF-tree.

b70f1cf

annoviko added the Bug Tasks related to found bugs label Dec 23, 2019

annoviko closed this as completed Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570

nabilEM commented Dec 22, 2019

annoviko commented Dec 23, 2019 •

edited

Loading

nabilEM commented Dec 23, 2019 •

edited

Loading

nabilEM commented Dec 23, 2019 •

edited

Loading

annoviko commented Dec 23, 2019

nabilEM commented Dec 23, 2019

annoviko commented Dec 23, 2019 •

edited

Loading

nabilEM commented Dec 23, 2019

nabilEM commented Dec 23, 2019

annoviko commented Dec 24, 2019 •

edited

Loading

annoviko commented Dec 24, 2019

nabilEM commented Dec 24, 2019 •

edited

Loading

annoviko commented Jan 8, 2020

nabilEM commented Jan 20, 2020

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #570

Comments

nabilEM commented Dec 22, 2019

annoviko commented Dec 23, 2019 • edited Loading

nabilEM commented Dec 23, 2019 • edited Loading

nabilEM commented Dec 23, 2019 • edited Loading

annoviko commented Dec 23, 2019

nabilEM commented Dec 23, 2019

annoviko commented Dec 23, 2019 • edited Loading

nabilEM commented Dec 23, 2019

nabilEM commented Dec 23, 2019

annoviko commented Dec 24, 2019 • edited Loading

annoviko commented Dec 24, 2019

nabilEM commented Dec 24, 2019 • edited Loading

annoviko commented Jan 8, 2020

nabilEM commented Jan 20, 2020

annoviko commented Dec 23, 2019 •

edited

Loading

nabilEM commented Dec 23, 2019 •

edited

Loading

nabilEM commented Dec 23, 2019 •

edited

Loading

annoviko commented Dec 23, 2019 •

edited

Loading

annoviko commented Dec 24, 2019 •

edited

Loading

nabilEM commented Dec 24, 2019 •

edited

Loading