Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Todo list #8

Open
5 tasks done
folkvir opened this issue Sep 12, 2019 · 7 comments
Open
5 tasks done

Todo list #8

folkvir opened this issue Sep 12, 2019 · 7 comments

Comments

@folkvir
Copy link
Collaborator

folkvir commented Sep 12, 2019

Feel free to modify this list or add suggestions in comments.

@Chat-Wane
Copy link

Chat-Wane commented Dec 20, 2019

I don't need it. I didn't try it. But for the sake of completeness: https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/
Links to the paper and implementations (no js ones) are available there.

( Hello btw! :3 )

@Callidon
Copy link
Owner

Callidon commented Apr 9, 2020

Some update on this topic:

  • The HyperLogLog has been implemented in release v1.1.0.
  • The TopK has been implemented in release v1.2.0. It uses a Count Min Sketch for frequency estimation and a Minheap for implementing a sliding window of values.

@Callidon
Copy link
Owner

Callidon commented Apr 22, 2020

Some update: The MinHash has been released in v1.3.0.

@ShisiJu
Copy link

ShisiJu commented Nov 24, 2020

Thanks for your jobs.

This was referenced May 9, 2021
@Callidon
Copy link
Owner

Update: XOR filters has been released in the 2.0.0 version of the npm package, following #52 merge.

@folkvir
Copy link
Collaborator Author

folkvir commented Mar 31, 2022

Update: the Scalable Bloom Filter with optimizations and bug fixes have been released in version 3.0.0

@adevine
Copy link

adevine commented Aug 21, 2022

Hi @Callidon and @folkvir - thanks for a really useful library!

I had a suggestion for an additional data structure, as well as a question/suggestion for improvement on the CountingBloomFilter.

I have a need for a Time-To-Live, or "expiring", CountingBloomFilter. That is, when entries are added to the CountingBloomFilter, they will automatically be removed after some number of milliseconds. In addition, upon creation of the CountingBloomFilter, the creator can optionally specify a "dispose" callback function that will get called when the entry expires. The idea is generally to merge the concepts of a TTL Cache (e.g. https://github.com/isaacs/ttlcache) with a CountingBloomFilter. Obviously this could just be built as a separate data structure on top of CountingBloomFilter that manages the timeouts, but I think there is a lot of utility to having it in one single data structure. I am going to implement this for a project I need, but please let me know if you would like me to contribute this back to your repository.

My other question relates to the implementation of the _filter member of the CountingBloomFilter class. This Array stores a tuple pair at each index that is [bit 0 or 1, count of items at this index]. But that first bit is always set to 1 if count of items is > 0, and set to 0 if count of items === 0. So I might be missing something but why not just make _filter an array of numbers where each entry is the count of items at that index, and then the has method would just look at whether that value is > 0? Seems like having each value in the _filter member storing an Array object with 2 numbers is a lot of unnecessary memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants