Fuzzy is a python library implementing common phonetic algorithms quickly. Typically this is in string similarity exercises, but they're pretty versatile.
It uses C Extensions (via Cython) for speed.
The algorithms are:
- Soundex
- NYSIIS
- Double Metaphone Based on Maurice Aubrey's C code from his perl implementation.
The functions are quite easy to use!
>>> import fuzzy
>>> soundex = fuzzy.Soundex(4)
>>> soundex('fuzzy')
'F200'
>>> dmeta = fuzzy.DMetaphone()
>>> dmeta('fuzzy')
['FS', None]
>>> fuzzy.nysiis('fuzzy')
'FASY'
Fuzzy's Double Metaphone was ~10 times faster than the pure python implementation by Andrew Collins in some recent testing. Soundex and NYSIIS should be similarly faster. Using iPython's timeit:
In [3]: timeit soundex('fuzzy') 1000000 loops, best of 3: 326 ns per loop In [4]: timeit dmeta('fuzzy') 100000 loops, best of 3: 2.18 us per loop In [5]: timeit fuzzy.nysiis('fuzzy') 100000 loops, best of 3: 13.7 us per loop
We recommend the Python-Levenshtein module for fast, C based string distance/similarity metrics. Among others functions it includes:
- Levenshtein edit distance
- Jaro distance
- Jaro-Winkler distance
- Hamming distance
In testing it's been several times faster than comparable pure python implementations of those algorithms.