-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move add_sequence and add_protein to SigsTrait #1057
Labels
Comments
8 tasks
This is implemented in #1223 |
This was referenced Oct 22, 2020
luizirber
added a commit
that referenced
this issue
Oct 31, 2020
Implement a HyperLogLog sketch based on the `khmer` implementation but using the estimator from ["New cardinality estimation algorithms for HyperLogLog sketches"](http://oertl.github.io/hyperloglog-sketch-estimation-paper/paper/paper.pdf) (also implemented in `dashing`). This PR also moves `add_sequence` and `add_protein` to `SigsTrait`, closing #1057. The encoding data and methods (`hp`, `dayhoff`, `aa` and `HashFunctions`) was in the MinHash source file, and since it is more general-purpose it was moved to a new module `encodings`, which is then used by `SigsTrait`. (these changes are both spun off #1201)
4 tasks
Fixed in #1223 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Move add_sequence and add_protein to SigsTrait, with a default implementation. It only requires ksize() and add_word(), and would avoid replicating that logic in many places. As an example, it is currently implemented in both
KmerMinHash
andKmerMinHashBTree
, but it is the same impl.These methods are generic for any sketch, but they can choose to specialize if there are performance benefits. One example is supporting rolling hashes (like
nthash
), that can avoid theadd_word
call and use.add_hash
directly (since they don't need to recalculate the wholeword
).(punted from #1045 (comment))
The text was updated successfully, but these errors were encountered: