Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: TensorPrimitives.HammingDistance #102980

Closed
stephentoub opened this issue Jun 3, 2024 · 2 comments · Fixed by #103305
Closed

[API Proposal]: TensorPrimitives.HammingDistance #102980

stephentoub opened this issue Jun 3, 2024 · 2 comments · Fixed by #103305
Assignees
Labels
api-approved API was approved in API review, it can be implemented api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Numerics.Tensors in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Jun 3, 2024

Background and motivation

Given two vectors/lists/strings, Hamming distance is the number of positions at which the two vectors differ. This has a variety of uses (https://en.wikipedia.org/wiki/Hamming_distance), but when focusing on sequences of bits, it's commonly used with embedding vectors that have been quantized down to a single bit per element, with the Hamming distance between two embeddings then used as the similarity/distance metric. When used as such, Hamming distance is computable as the popcount of the xor of the bits.

API Proposal

namespace System.Numerics.Tensors;

public static class TensorPrimitives
{
    // Existing
    public static void PopCount<T>(ReadOnlySpan<T> x, Span<T> destination);
    
    // New
+   public static long PopCount<T>(ReadOnlySpan<T> x) where T : IBinaryInteger<T>;
+   public static int HammingDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y); // computes number of differing elements
+   public static long HammingBitDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y) where T : IBinaryInteger<T>; // computes number of differing bits
}

(I've included the new PopCount overload as it felt strange to have a method that effectively did the same thing for the xor of two inputs but not for a single input. It's also useful in its own right.)

API Usage

ReadOnlySpan<byte> singleBitEmbedding1 = ...;
ReadOnlySpan<byte> singleBitEmbedding2 = ...;
long distance = TensorPrimitives.HammingBitDistance(singleBitEmbedding1, singleBitEmbedding2);

Alternative Designs

  • Different names to distinguish bits from elements?
  • OperationStatus / Try-based PopCount that could accomodate continugous memory with more than 2^63 bits.
  • Another HammingDistance overload accepting an IEqualityComparer (or a generic constrained to one).

Risks

n/a

@stephentoub stephentoub added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics.Tensors labels Jun 3, 2024
@stephentoub stephentoub added this to the 9.0.0 milestone Jun 3, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 3, 2024
@stephentoub stephentoub self-assigned this Jun 3, 2024
@terrajobst
Copy link
Member

terrajobst commented Jun 11, 2024

Video

  • Looks good as proposed
namespace System.Numerics.Tensors;

public static class TensorPrimitives
{
    // Existing
    // public static void PopCount<T>(ReadOnlySpan<T> x, Span<T> destination);
    
    // New
    public static long PopCount<T>(ReadOnlySpan<T> x) where T : IBinaryInteger<T>;
    public static int HammingDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y);
    public static long HammingBitDistance<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y) where T : IBinaryInteger<T>;
}

@terrajobst terrajobst added the api-approved API was approved in API review, it can be implemented label Jun 11, 2024
@stephentoub stephentoub added the in-pr There is an active PR which will close this issue when it is merged label Jun 11, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Numerics.Tensors in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants