Skip to content

Python bindings for xorfilter(faster and smaller than bloom and cuckoo filters)

License

Notifications You must be signed in to change notification settings

glitzflitz/pyxorfilter

Repository files navigation

pyxorfilter

Python bindings for C implementation of Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters and of Binary Fuse Filters: Fast and Smaller Than Xor Filters.

Installation

pip install pyxorfilter

From Source

git clone --recurse-submodules https://github.com/glitzflitz/pyxorfilter
cd pyxorfilter
python setup.py build_ext
python setup.py install

Usage

>>> from pyxorfilter import Xor8, Xor16, Fuse8, Fuse16
>>> filter = Xor8(5)	#or Xor16(size)
>>> #Supports unicode strings and heterogeneous types
>>> test_str = ["あ","अ", 51, 0.0, 12.3]
>>> filter.populate(test_str)
True
>>> filter.contains("अ")
True
>>> filter[51]  #You can use __getitem__ instead of contains
True
>>> filter["か"]
False
>>> filter.contains(150)
False
>>> filter.size_in_bytes()
60

You can serialize a filter with the serialize() method which returns a buffer, and you can recover the filter with the deserialize(buffer) method, which returns a filter:

> f = open('/tmp/output', 'wb')
> f.write(filter.serialize())
> f.close()
> recoverfilter = Xor8.deserialize(open('/tmp/output', 'rb').read())

Caveats

Accuracy

For more accuracy(less false positives) use larger but more accurate Xor16 for Fuse16.

For large sets (contain millions of keys), Fuse8/Fuse16 filters are faster and smaller than Xor8/Xor16.

>>> filter = Xor8(1000000)
>>> filter.size_in_bytes()
1230054
>>> filter = Fuse8(1000000)
>>> filter.size_in_bytes()
1130536

TODO

  • Add unit tests
  • Add CI support for distributing pyxorfilter with PyPI.

Links

About

Python bindings for xorfilter(faster and smaller than bloom and cuckoo filters)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages