-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reusing Hashes for faster lookups #60
Comments
Hello ✋ Thank for using our library! First, we won't add any new function not related to the associated papers of the filters. public filter_has_in(element: HashableInput, bfs: BloomFilter[]): BloomFilter[] {
// compute once the indexes, be aware that all your filters must have same _size, _nbhashes and seed
// You may have undesirable side-effects if it is not the case
const indexes = bfs[0]._hashing.getIndexes(element, bfs[0]._size, bfs[0]._nbHashes, bfs[0].seed);
const filtered = [];
for (let i = 0; i < bfs.length; i++) {
let ok = true;
// similar to bfs[i].has() we need to know if all indexes are in the internal structure
for (let j = 0; j < indexes.length; j++) {
if (!bfs[i]._filter.has(indexes[j])) {
ok = false;
break; // no need to continue, one false index means it is definitely not there, save us many loops
}
}
if (ok) {
filtered.push(bfs[i]);
}
}
// return the filters which may have the key.
return filtered;
} I did not test it locally but it should work. Give it a try and let us know if it fits your needs. (CC @Callidon ) |
Hi @folkvir So I will now switch to XOR-filters so that I can rebuild the filter with a different seed each time and the server only has to do 3 hashes per lookup. Also I found out there is this new type of XOR-filter called "binary fuse filter" which has a smaller space size and is faster to be build. link. Is this something that could be added to this library? |
Hi! Yes the binary fuse filter is definitely something we can add in the library. (CC @Callidon ) Yes this is dangerous to keep the seed in your filters if those are public so you need to hide it someway. |
Does anyone know if XOR-Filters are also prune to polution attacks, like bloom filters are? |
Is your feature request related to a problem? Please describe.
My Problem: I replicate bloom filters from many clients (up to 1000) to my server.
The server then has to lookup a given key "foobar" and check which of the replicated filters contains the key.
My bloom filters are pretty big and need about 13 hashes.
To check which replicated filter has the key, the server would then have to do 13*1000 hash operations.
Describe the solution you'd like
A way to precalculate the hash once and then lookup the existence in multiple bloom filters without having to hash again.
Acceptation criterias
For testing we could run a test that ensures that lookups for pre-calculated hashes are exactly equal to normal key lookups.
Additional context
Also a question: Is it possible to use a different hash function? My plan is to run the hashing inside of WebAssembly for better performance, so maybe even an
async
hash function would be useful.The text was updated successfully, but these errors were encountered: