Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] report both weighted and unweighted % recovered in gather #2301

Merged
merged 8 commits into from
Sep 28, 2022

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Sep 28, 2022

This PR changes sourmash gather to report both weighted and unweighted k-mer identification per #1818; the changes in #2249 made this a pretty easy fix, and the conversation in dib-lab/genome-grist#197 made it obvious that we should support it for genome-grist purposes!

This PR also updates the docs on % recovered in response to #2300 (comment).

example output

% sourmash gather SRR606249-abund-10k.sig.zip  podar-ref.zip 
...
found 64 matches total;
the recovered matches hit 94.0% of the abundance-weighted query.
the recovered matches hit 45.6% of the query k-mers (unweighted).

If --ignore-abundance is specified or abundances are not available, only the unweighted number is reported.

Fixes #1818.

TODO items

@codecov
Copy link

codecov bot commented Sep 28, 2022

Codecov Report

Merging #2301 (8373c5c) into latest (719e7d5) will increase coverage by 7.31%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           latest    #2301      +/-   ##
==========================================
+ Coverage   84.83%   92.14%   +7.31%     
==========================================
  Files         131      100      -31     
  Lines       15676    11414    -4262     
  Branches     2252     2253       +1     
==========================================
- Hits        13298    10517    -2781     
+ Misses       2083      602    -1481     
  Partials      295      295              
Flag Coverage Δ
python 92.14% <100.00%> (+<0.01%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/commands.py 90.25% <100.00%> (+0.12%) ⬆️
src/sourmash/search.py 97.95% <100.00%> (-0.01%) ⬇️
src/core/src/errors.rs
src/core/src/signature.rs
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/sketch/minhash.rs
src/core/src/ffi/hyperloglog.rs
src/core/src/cmd.rs
src/core/src/ffi/index/mod.rs
src/core/tests/storage.rs
... and 23 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ctb ctb changed the title [WIP] report both weighted and unweighted % recovered in gather [MRG] report both weighted and unweighted % recovered in gather Sep 28, 2022
@ctb
Copy link
Contributor Author

ctb commented Sep 28, 2022

Ready for review & merge @sourmash-bio/devs

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! It'll be pretty helpful to have both weighted and unweighted results printed.

@ctb ctb merged commit 2246e71 into latest Sep 28, 2022
@ctb ctb deleted the report_flat_and_weighted branch September 28, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

provide both abundance-weighted coverage & flattened coverage in sourmash gather output?
2 participants