Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] fix Signature.minhash API during sourmash sketch #2329

Merged
merged 8 commits into from
Oct 15, 2022

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Oct 14, 2022

Addresses #1167 by fixing signature_first_mh in the signature FFI to support both KmerMinHash and KmerMinHashBTree; this way, sketches generated by the Python _signatures_for_sketch_factory (which are BTree sketches in Rust) can be passed back to
Python.

Specifically, this PR:

  • implements From<&KmerMinHashBTree> for KmerMinHash;
  • adds an arm to the match statement in the FFI function to catch KmerMinHashBTree structs and convert them into KmerMinHash structs suitable for returning to Python;
  • provides a default match arm to catch (e.g.) HyperLogLog and raise an error back to Python reporting that it found an unsupported sketch type;
  • adds a little to the Python tests to prevent backsliding.

Note that I don't think this does any more copying of sketches than the current codebase does.

And, on that halcyonic day when the FFI properly wraps KmerMinHashBTree, this code can be replaced by returning the proper Rust object without conversion.

Fixes #1167.

@codecov
Copy link

codecov bot commented Oct 14, 2022

Codecov Report

Merging #2329 (9808004) into latest (5abc988) will decrease coverage by 0.12%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           latest    #2329      +/-   ##
==========================================
- Coverage   84.09%   83.97%   -0.13%     
==========================================
  Files         129      129              
  Lines       14951    14967      +16     
  Branches     2191     2191              
==========================================
- Hits        12573    12568       -5     
- Misses       2083     2104      +21     
  Partials      295      295              
Flag Coverage Δ
python 92.10% <ø> (-0.05%) ⬇️
rust 57.78% <0.00%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/ffi/signature.rs 0.00% <0.00%> (ø)
src/core/src/sketch/minhash.rs 86.75% <0.00%> (-1.27%) ⬇️
src/sourmash/__init__.py 97.05% <0.00%> (-2.95%) ⬇️
src/sourmash/sourmash_args.py 93.06% <0.00%> (-0.61%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

It may not in fact be bad to use copy the KmerMinHashBTree into a KmerMinHash, since we're already doing a clone in the KmerMinHash situation - compare

        Some(Sketch::MinHash(mh)) => {
            Ok(SourmashKmerMinHash::from_rust(mh.clone()))
        },

with

        Some(Sketch::LargeMinHash(mh_btree)) => {
            let mh = KmerMinHash::from(mh_btree.clone());
            Ok(SourmashKmerMinHash::from_rust(mh))

There's already an mh.clone() in the top code; it seems like if we could get KmerMinHash::from to work directly on mh_btree without the clone, the code and performance would be comparable?

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

Well, that's what 388b250 does! I implemented From<&KmerMinHashBTree> for KmerMinHash.

An alternate approach would be to figure out if there's an equivalent structure to SourmashKmerMinHash that would work to encapsulate a KmerMinHashBTree. I'll root around in the code for that.

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

An alternate approach would be to figure out if there's an equivalent structure to SourmashKmerMinHash that would work to encapsulate a KmerMinHashBTree. I'll root around in the code for that.

OK, I think that would mean one of two things -

First, implementing a whole new conversion a la /src/core/src/ffi/minhash.rs that would provide the implementation underlying this trait :

impl ForeignObject for SourmashKmerMinHash {
    type RustObject = KmerMinHash;
}

but for KmerMinHashBTree. Python presumably wouldn't care what was underneath as long as the same methods existed on top.

OR... hmm, I don't think this will work because object sizes are different, but I wonder if the key methods on KmerMinHash could be extracted to a trait, and we could make sure that trait was implemented on both KmerMinHash and KmerMinHashBTree, and support SourmashKmerMinHash for anything with that trait?

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

ok, changed error message for HyperLogLog and others to:

Traceback (most recent call last):
  File "rs-crash.py", line 7, in <module>
    sig.minhash
  File "/Users/t/dev/sourmash/src/sourmash/signature.py", line 47, in minhash
    self._methodcall(lib.signature_first_mh)
  File "/Users/t/dev/sourmash/src/sourmash/utils.py", line 25, in _methodcall
    return rustcall(func, self._get_objptr(), *args)
  File "/Users/t/dev/sourmash/src/sourmash/utils.py", line 78, in rustcall
    raise exc
sourmash.exceptions.Internal: internal error: "found unsupported sketch type"

which at least seems explanatory ;)

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

@luizirber ready for your inspection when you get a chance!

@luizirber
Copy link
Member

OR... hmm, I don't think this will work because object sizes are different, but I wonder if the key methods on KmerMinHash could be extracted to a trait, and we could make sure that trait was implemented on both KmerMinHash and KmerMinHashBTree, and support SourmashKmerMinHash for anything with that trait?

There is a bunch of this happening in the mastiff PR:
https://github.com/sourmash-bio/sourmash/pull/2230/files#diff-bc491fe63b2be4f29e46267cea94f70a446ac294157dd5a9629fc20317dd66bf

I started with an Enum on the FFI, without changing too much of the Rust side yet (just moving methods around to comply with the new traits in https://github.com/sourmash-bio/sourmash/pull/2230/files#diff-acf89d8b0a76cd89eccdabe5ff0fe4233ffe9b94983d50b34d1a1290a053094a

@ctb
Copy link
Contributor Author

ctb commented Oct 14, 2022

There is a bunch of this happening in the mastiff PR:
https://github.com/sourmash-bio/sourmash/pull/2230/files#diff-bc491fe63b2be4f29e46267cea94f70a446ac294157dd5a9629fc20317dd66bf

I was guessing that, since you referred to it elsewhere!

If you think there's value in merging this in now, lmk. But I'm not wedded to it and happy to wait, or have any useful code swiped for the mastiff PR.

@luizirber luizirber self-requested a review October 15, 2022 18:55
Copy link
Member

@luizirber luizirber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other approaches for fixing this, but this one works now =]

@ctb ctb changed the title [EXP] fix Signature.minhash API during sourmash sketch [MRG] fix Signature.minhash API during sourmash sketch Oct 15, 2022
@ctb ctb merged commit 2198b32 into latest Oct 15, 2022
@ctb ctb deleted the fix/sig_minhash_api_sketch branch October 15, 2022 23:41
ctb added a commit that referenced this pull request Dec 3, 2022
…and released in v4.6.0 (#2391)

This PR fixes a bug introduced in #2329 and released in v4.6.0. It also
adds tests so that this bug does not trouble us again...

The easiest fix involves serializing _every_ signature into JSON format,
and then deserializing it. This may add a slight performance regression
that I will work to fix in a different PR, once I get v4.6.1 out.

Fixes #2390.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix rust Signature API for minhashes method.
2 participants