Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

ValueError: setting an array element with a sequence.Using pysparnn.matrix_distance.SlowEuclideanDistance #27

Open
liuchenbaidu opened this issue Aug 14, 2019 · 1 comment

Comments

@liuchenbaidu
Copy link

import pysparnn.cluster_index as ci

from sklearn.feature_extraction.text import TfidfVectorizer
import pysparnn
data = [
'hello world',
'oh hello there',
'Play it',
'Play it again Sam',
]
data=['你在干什么',
'你在干啥子',
'你在做什么',
'你好啊',
'我喜欢吃香蕉']

tv = TfidfVectorizer()
tv.fit(data)

features_vec = tv.transform(data)
print(type(features_vec),features_vec.shape)

build the search index!

cp = ci.MultiClusterIndex(features_vec, data,pysparnn.matrix_distance.SlowEuclideanDistance)

search the index with a sparse matrix

search_data = [
'oh there',
'Play it again Frank'
]

search_data = [
'你在干啥','我喜欢吃香蕉'
]
search_features_vec = tv.transform(search_data)

res=cp.search(search_features_vec, k=3, k_clusters=3, return_distance=False)

print(res)

@kchaliki
Copy link

kchaliki commented Feb 4, 2020

@liuchenbaidu indeed that code doesn't work with sparse matrices, the test actually uses dense which is why this went unnoticed. I did implement this separately somewhere using scikit's euclidean distance but it is so much slower than cosine that it begs the question whether you need it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants