You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.
@liuchenbaidu indeed that code doesn't work with sparse matrices, the test actually uses dense which is why this went unnoticed. I did implement this separately somewhere using scikit's euclidean distance but it is so much slower than cosine that it begs the question whether you need it.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
import pysparnn.cluster_index as ci
from sklearn.feature_extraction.text import TfidfVectorizer
import pysparnn
data = [
'hello world',
'oh hello there',
'Play it',
'Play it again Sam',
]
data=['你在干什么',
'你在干啥子',
'你在做什么',
'你好啊',
'我喜欢吃香蕉']
tv = TfidfVectorizer()
tv.fit(data)
features_vec = tv.transform(data)
print(type(features_vec),features_vec.shape)
build the search index!
cp = ci.MultiClusterIndex(features_vec, data,pysparnn.matrix_distance.SlowEuclideanDistance)
search the index with a sparse matrix
search_data = [
'oh there',
'Play it again Frank'
]
search_data = [
'你在干啥','我喜欢吃香蕉'
]
search_features_vec = tv.transform(search_data)
res=cp.search(search_features_vec, k=3, k_clusters=3, return_distance=False)
print(res)
The text was updated successfully, but these errors were encountered: