About the sparse search result compare with lucene searcher ? #34

svjack · 2021-04-01T10:04:48Z

Thanks to provide this convenient toolkit
I retrieve bm25 and tfidf sparse vector from lucene indexer (provide by pyserini)
and use this project to generate sparse indexer to search.
i find that these indexer can not beat original lucene search results.
(this problem seems not have much effect on tiny datasets or semantic disperse datasets,
but with the dataset become larger, the shortcomings seems can not be omitted which is the situation to use this project.)

This is not the problem of your clustering search algorithm. But the sparse feature itself.
And if i use SVD to decrease the dimension of sparse data, it can only maintain topic level feature.
So i don’t understand the truly usage of sparse feature except calculate some search scores(like bm25)
Because they seems weak than truly lexicon based score (bm25) and dense semantic similarity based on BERT
sentence embedding (like Sentence-Transformers)

Can you provide some truly awesome text sparse feature construction reference materials that can use this project in
a suitable way ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the sparse search result compare with lucene searcher ? #34

About the sparse search result compare with lucene searcher ? #34

svjack commented Apr 1, 2021 •

edited

Loading

About the sparse search result compare with lucene searcher ? #34

About the sparse search result compare with lucene searcher ? #34

Comments

svjack commented Apr 1, 2021 • edited Loading

svjack commented Apr 1, 2021 •

edited

Loading