You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.
Thanks to provide this convenient toolkit
I retrieve bm25 and tfidf sparse vector from lucene indexer (provide by pyserini)
and use this project to generate sparse indexer to search.
i find that these indexer can not beat original lucene search results.
(this problem seems not have much effect on tiny datasets or semantic disperse datasets,
but with the dataset become larger, the shortcomings seems can not be omitted which is the situation to use this project.)
This is not the problem of your clustering search algorithm. But the sparse feature itself.
And if i use SVD to decrease the dimension of sparse data, it can only maintain topic level feature.
So i don’t understand the truly usage of sparse feature except calculate some search scores(like bm25)
Because they seems weak than truly lexicon based score (bm25) and dense semantic similarity based on BERT
sentence embedding (like Sentence-Transformers)
Can you provide some truly awesome text sparse feature construction reference materials that can use this project in
a suitable way ?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thanks to provide this convenient toolkit
I retrieve bm25 and tfidf sparse vector from lucene indexer (provide by pyserini)
and use this project to generate sparse indexer to search.
i find that these indexer can not beat original lucene search results.
(this problem seems not have much effect on tiny datasets or semantic disperse datasets,
but with the dataset become larger, the shortcomings seems can not be omitted which is the situation to use this project.)
This is not the problem of your clustering search algorithm. But the sparse feature itself.
And if i use SVD to decrease the dimension of sparse data, it can only maintain topic level feature.
So i don’t understand the truly usage of sparse feature except calculate some search scores(like bm25)
Because they seems weak than truly lexicon based score (bm25) and dense semantic similarity based on BERT
sentence embedding (like Sentence-Transformers)
Can you provide some truly awesome text sparse feature construction reference materials that can use this project in
a suitable way ?
The text was updated successfully, but these errors were encountered: