How to get a score of 100 #164

samayala22 · 2021-11-19T18:46:52Z

samayala22
Nov 19, 2021

Hey Max,

First off, thanks for creating/maintaining this awesome library !

I've been trying to find a way to get a score of 100 when comparing the strings 'simplee worde' and 'very simple big word'. Knowing that that:

>>> fuzz.partial_ratio('simplee','simple')
100.0
>>> fuzz.partial_ratio('worde','word')
100.0

By reading the docs, I thought that partial_token_set_ratio would work but it didn't.

>>> fuzz.partial_token_set_ratio('simplee worde','very simple big word')
69.23076923076923

I assume that's because there are no perfectly similar words between both ? (if i put simple instead of simplee it gives me 100)

Also, I found out that by adding letters after the worde increased the score to a cap of 75.00 but adding the letters to the end of the word simplee made the score drop to a lower limit of 55.17.

>>> fuzz.partial_token_ratio('simpleeadassdsdl worde','very simple big word')
55.172413793103445
>>> fuzz.partial_token_set_ratio('simplee wordeeaaasasdasd','very simple big word')
75.0

In case you're wondering, the other fuzz modules performed worse.

This is some strange behaviour that I don't understand because I don't know the underlying algorithms you're using.

Maybe you can shed some light here, and tell why, in this case, it's not possible to get a score of 100.

Answered by maxbachmann

Nov 23, 2021

The underlying algorithms in rapidfuzz.fuzz come from fuzzywuzzy/thefuzz.

fuzz.partial_ratio

fuzz.partial_ratio simply searches for the alignment with the highest fuzz.ratio. So e.g. for the strings ab<-> abcd it will calculate the fuzz.ratio for the following alignments:

ab <-> a
ab <-> ab
ab <-> bc
ab <-> cd
ab <-> d

and return the highest similarity. In this case 100 for the alignment ab <-> ab.

fuzz.token_sort_ratio / fuzz.partial_token_sort_ratio

Sorts the words in the string and then calculates the fuzz.ratio/fuzz.partial_ratio of the sorted string.

fuzz.token_set_ratio / fuzz.partial_token_set_ratio

Splits the strings into words. Afterwards it creates three lists

unique1 holds wo…

View full answer

maxbachmann · 2021-11-23T11:47:39Z

maxbachmann
Nov 23, 2021
Maintainer

The underlying algorithms in rapidfuzz.fuzz come from fuzzywuzzy/thefuzz.

fuzz.partial_ratio

fuzz.partial_ratio simply searches for the alignment with the highest fuzz.ratio. So e.g. for the strings ab<-> abcd it will calculate the fuzz.ratio for the following alignments:

ab <-> a
ab <-> ab
ab <-> bc
ab <-> cd
ab <-> d

and return the highest similarity. In this case 100 for the alignment ab <-> ab.

fuzz.token_sort_ratio / fuzz.partial_token_sort_ratio

Sorts the words in the string and then calculates the fuzz.ratio/fuzz.partial_ratio of the sorted string.

fuzz.token_set_ratio / fuzz.partial_token_set_ratio

Splits the strings into words. Afterwards it creates three lists

unique1 holds words, which are only in string 1
unique2 holds words, which are only in string 2
common holds words, which are both in string1 and string2

Afterwards these lists are sorted and joined as " ".join(sorted(list)) and the fuzz.ratio/fuzz.partial_ratio for the following combinations is calculated:

common_joined <-> common_joined + unique1_joined
common_joined <-> common_joined + unique2_joined
common_joined + unique1_joined <-> common_joined + unique1_joined

fuzz.token_ratio / fuzz.partial_token_ratio

This returns max(token_sort_ratio, token_set_ratio). I use this internally in the WRatio implementation and I just made it available, since it is faster than manually calculating the max of the two ratios.

1 reply

samayala22 Dec 4, 2021
Author

Okay, thanks for the explanation, it was very insightful !

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a score of 100 #164

{{title}}

Replies: 2 comments 1 reply

This comment was marked as off-topic.

{{title}}

{{title}}

Select a reply

How to get a score of 100 #164

samayala22 Nov 19, 2021

fuzz.partial_ratio

fuzz.token_sort_ratio / fuzz.partial_token_sort_ratio

fuzz.token_set_ratio / fuzz.partial_token_set_ratio

Replies: 2 comments · 1 reply

This comment was marked as off-topic.

maxbachmann Nov 23, 2021 Maintainer

fuzz.partial_ratio

fuzz.token_sort_ratio / fuzz.partial_token_sort_ratio

fuzz.token_set_ratio / fuzz.partial_token_set_ratio

fuzz.token_ratio / fuzz.partial_token_ratio

samayala22 Dec 4, 2021 Author

samayala22
Nov 19, 2021

Replies: 2 comments 1 reply

maxbachmann
Nov 23, 2021
Maintainer

samayala22 Dec 4, 2021
Author