-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weight is incorrect, not a proper DL, not metric #1
Comments
Hi ywg, thanks for reporting this. You're right that this does not implement a proper Damerau-Levenshtein distance - it implements the optimal string alignment algorithm, as stated in the first sentence of the README. I might have misinterpreted the Wikipedia article to say that this was also called the restricted Damerau-Levenshtein distance. Re-reading the article it's still not clear to me whether that is an accepted use since it talks about a "true" Damerau-Levenshtein algorithm which presumably would be pointless if there wasn't a restricted version? Perhaps you could shed some light on the use and point to a more credible source than Wikipedia? :) |
Hi cbaatz, Good information source regarding this algorithm is hard to find, and a lot of ressources are actually restricted edit distance (OSA) confused with DL. Even the french wikipedia article is totally wrong and is illustrated with an OSA implementation. I've found these two blog post which are (IMHO) quite clear on the question: Sadly the original article from Fred Damerau is behind a paywall, but this one isn't and can be considered more authoritative than blog posts: "A.1 Proof of Metric Properties If you wan't I've wrote some tests with JSSpec |
Hi ywg, I'd be interested in the tests - I can't promise that I'll rewrite this to true DL, but I'll consider it and then the tests would be great. CB |
This implementation seems to be wrong.
Althought I've not dive deep enough in the code to spot where is the bug, I've isolated a simple test case.
DL("beak", "water") returns 5 instead of 4
step 1
beak water => substitution (b by w cost = 1)
step 2
weak water => permutation (e and a cost = 1)
step 3
waek water => deletion (t cost = 1)
step 4
waek waer => substitution (r by k cost = 1)
result
waek = waek => in 4 operations of cost 1
This bug cause the algorithm to lose it's metric properties, which can have a lot of implications. If you use it to build indexes it will certainly screw up your search engine. If you use it for a simple string compare, it should be good enough (even if not a real DamerauLevenshtein).
The text was updated successfully, but these errors were encountered: