Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetSimilaritySearch on Scale #5

Open
variux opened this issue Jan 3, 2020 · 1 comment
Open

SetSimilaritySearch on Scale #5

variux opened this issue Jan 3, 2020 · 1 comment

Comments

@variux
Copy link

variux commented Jan 3, 2020

Is there any possibility of integration using redis or cassandra as already Minhash LSH has?

@ekzhu
Copy link
Owner

ekzhu commented Jan 17, 2020

Integrating with redis or other external storage layer is definitely possible. However I would consider the issue of I/O cost with external storage -- sets of original data and posting lists (the data structured used in this library) can be much bigger than MinHash and LSH, so a Python compute layer + Redis/Cassandra storage layer may be inefficient due to large number of I/Os. A more efficient implementation needs to consider the costs, adding a lot of complexity. I do have an algorithm to solve this problem (JOSIE, VLDB 2019, Github), but I haven't had time to write a production-ready library for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants