-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting comparisons using byte[]
#35
Comments
Thanks for the suggestion! One problem off the top of my head is that I'd say if you care about codepoints, the best option today is probably to use Adding support for comparing UTF-8 codepoints in byte arrays is not something that is in scope for this project at the moment unless the upstream java-string-similarity adds support for it, although we could keep it on the roadmap for the future in a modular way that allows for maintaining full upstream compatibility while adding additional methods of our own. |
Thanks for the reply @paulirwin! Actually we have a system comparing two Your concern about UTF8-encoding is completely correct, and any ByteSimilarity implementation would have to make those caveats clear. Another think I realized is the library targets .NET standard 2.0, which would make a lot of this challenging. .NET 7 introduced lots of support for generic math, which would make a That doesn't lend itself well to multi-targeting, since you would basically have two versions of the same code, at which point it might as well be 2 libraries. |
That makes sense, thank you. Perhaps we might start with just a single algorithm rather than doing it for all of them upfront. Do you have a specific similarity algorithm that you'd use that could be a good first test case for this? |
We default to the Thanks for considering this! |
@RenderMichael Can you please pull the issue/35 branch and see how that works for you? It's a rough draft of how I think this could work. As a side bonus, it adds support for i.e. With this, I'm giving up a little bit of my strict adherence to upstream code, but it remains close enough I think to be able to keep in step with it when it changes. |
@paulirwin I like it! One thing though, is there any way the The usual way to test ROS content equality is The previous implementation using string required an explicit conversion to a Thanks for the effort! |
Great catches all around, thank you. I think I've fixed all of them. Take a look at the branch or the PR for it and let me know what you think. #36 |
@paulirwin Sorry for the long delay, kids are kids. The contents of the PR look really nice! I tested |
@paulirwin The work in the associated PR looks great, any chance it could be merged? |
Resolves #35 for comparison using byte[], or any scenarios where i.e. ReadOnlySpan might be preferred.
@RenderMichael This has been merged. Let me know if you have any issues with that. We'll publish a NuGet release soon. I might make this a major version bump to v6.0 due to the changed interfaces. |
@RenderMichael v6.0.0 has been released to NuGet. Please let me know if you have any issues. Thanks! |
I want to use this library but I have data as
byte[]
.Would it be possible to add support for
ReadOnlySpan<byte>
as well asstring
? In fact, there's probably room for tons of optimizations if the algorithms takeReadOnlySpan<T>
which would be ROS for string inputsThe text was updated successfully, but these errors were encountered: