How to get a score of 100 #164
-
Hey Max, First off, thanks for creating/maintaining this awesome library ! I've been trying to find a way to get a score of 100 when comparing the strings >>> fuzz.partial_ratio('simplee','simple')
100.0
>>> fuzz.partial_ratio('worde','word')
100.0 By reading the docs, I thought that >>> fuzz.partial_token_set_ratio('simplee worde','very simple big word')
69.23076923076923 I assume that's because there are no perfectly similar words between both ? (if i put Also, I found out that by adding letters after the >>> fuzz.partial_token_ratio('simpleeadassdsdl worde','very simple big word')
55.172413793103445
>>> fuzz.partial_token_set_ratio('simplee wordeeaaasasdasd','very simple big word')
75.0 In case you're wondering, the other fuzz modules performed worse. This is some strange behaviour that I don't understand because I don't know the underlying algorithms you're using. Maybe you can shed some light here, and tell why, in this case, it's not possible to get a score of 100. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
This comment was marked as off-topic.
This comment was marked as off-topic.
-
The underlying algorithms in fuzz.partial_ratio
and return the highest similarity. In this case 100 for the alignment fuzz.token_sort_ratio / fuzz.partial_token_sort_ratioSorts the words in the string and then calculates the fuzz.token_set_ratio / fuzz.partial_token_set_ratioSplits the strings into words. Afterwards it creates three lists
Afterwards these lists are sorted and joined as
fuzz.token_ratio / fuzz.partial_token_ratioThis returns max(token_sort_ratio, token_set_ratio). I use this internally in the WRatio implementation and I just made it available, since it is faster than manually calculating the max of the two ratios. |
Beta Was this translation helpful? Give feedback.
The underlying algorithms in
rapidfuzz.fuzz
come fromfuzzywuzzy
/thefuzz
.fuzz.partial_ratio
fuzz.partial_ratio
simply searches for the alignment with the highestfuzz.ratio
. So e.g. for the stringsab
<->abcd
it will calculate thefuzz.ratio
for the following alignments:and return the highest similarity. In this case 100 for the alignment
ab <-> ab
.fuzz.token_sort_ratio / fuzz.partial_token_sort_ratio
Sorts the words in the string and then calculates the
fuzz.ratio
/fuzz.partial_ratio
of the sorted string.fuzz.token_set_ratio / fuzz.partial_token_set_ratio
Splits the strings into words. Afterwards it creates three lists