Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move add_sequence and add_protein to SigsTrait #1057

Closed
luizirber opened this issue Jun 27, 2020 · 2 comments
Closed

Move add_sequence and add_protein to SigsTrait #1057

luizirber opened this issue Jun 27, 2020 · 2 comments
Labels

Comments

@luizirber
Copy link
Member

Move add_sequence and add_protein to SigsTrait, with a default implementation. It only requires ksize() and add_word(), and would avoid replicating that logic in many places. As an example, it is currently implemented in both KmerMinHash and KmerMinHashBTree, but it is the same impl.

These methods are generic for any sketch, but they can choose to specialize if there are performance benefits. One example is supporting rolling hashes (like nthash), that can avoid the add_word call and use .add_hash directly (since they don't need to recalculate the whole word).

(punted from #1045 (comment))

@luizirber
Copy link
Member Author

luizirber commented Oct 22, 2020

This is implemented in #1223

luizirber added a commit that referenced this issue Oct 31, 2020
Implement a HyperLogLog sketch based on the `khmer` implementation but using the estimator from ["New cardinality estimation algorithms for HyperLogLog sketches"](http://oertl.github.io/hyperloglog-sketch-estimation-paper/paper/paper.pdf) (also implemented in `dashing`).

This PR also moves `add_sequence` and `add_protein` to `SigsTrait`, closing #1057.

The encoding data and methods (`hp`, `dayhoff`, `aa` and `HashFunctions`) was in the MinHash source file, and since it is more general-purpose it was moved to a new module `encodings`, which is then used by `SigsTrait`.

(these changes are both spun off #1201)
@luizirber
Copy link
Member Author

Fixed in #1223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant