Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine multiple str.minhash() APIs into one call #18168

Draft
wants to merge 3 commits into
base: branch-25.04
Choose a base branch
from

Conversation

davidwendt
Copy link
Contributor

Description

The following cudf strings minhash APIs are combined into a single method call:

minhash() - substring with 32-bit hash
minhash64() - substring with 64-bit hash
minhash_ngrams() - ngrams of strings list with 32-bit hash
minhash64_ngrams() - ngrams of strings list with 64-bit hash

The single API is minhash(self, seed, a, b, width) where the dtype of seed/a/b determines 32-bit vs 64-bit and the type of self (list or strings) determines substring vs ngrams.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 5, 2025
@davidwendt davidwendt self-assigned this Mar 5, 2025
Copy link

copy-pr-bot bot commented Mar 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@davidwendt
Copy link
Contributor Author

/ok to test

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

1 participant