Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally weight lca summarize output by hashval abundance. #1022

Merged
merged 20 commits into from
Jun 20, 2020

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jun 13, 2020

Adds --with-abundance to sourmash lca summarize, to weight output by hashval abundance in query signatures, per #1011 #634

This flag is off by default in 3.x but will be on by default in 4.0.

Also adds many unit tests for relevant functions in lca_utils.

Fixes #1011
Fixes #634

Still TODO:

  • add new tests with hmp-sigs
  • do deprecation tagging etc for 4.0
  • build some simulated data and make sure it works end-to-end
  • document

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@ctb
Copy link
Contributor Author

ctb commented Jun 13, 2020

This is exposing some "interesting" design considerations in the unweighted code... Looks like lca_utils.gather_assignments and count_lca_for_assignments may flatten abundances in unintended ways.

@codecov
Copy link

codecov bot commented Jun 13, 2020

Codecov Report

Merging #1022 into master will increase coverage by 0.02%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1022      +/-   ##
==========================================
+ Coverage   92.34%   92.37%   +0.02%     
==========================================
  Files          72       72              
  Lines        5421     5454      +33     
==========================================
+ Hits         5006     5038      +32     
- Misses        415      416       +1     
Impacted Files Coverage Δ
sourmash/lca/lca_utils.py 95.83% <90.90%> (-0.72%) ⬇️
sourmash/lca/command_summarize.py 89.10% <97.05%> (+1.88%) ⬆️
sourmash/cli/lca/summarize.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df2bac0...5fa323b. Read the comment docs.

@ctb
Copy link
Contributor Author

ctb commented Jun 17, 2020

ready for review @bluegenes @taylorreiter

@ctb
Copy link
Contributor Author

ctb commented Jun 17, 2020

If we follow pandas deprecation policy per #655, I think what we want to do is --

  • for next release (3.4?), make sure we've put deprecation warnings around "default" lca summarize, warning that default behavior will change to --with-abundance in 4.0;
  • for 4.0, switch that warning so that we tell people, hey, default behavior is now --with-abundance
  • for 5.0, remove that warning.

yes?

…#1027)

* add test

* simpler test to showcase

* improve test decription

* comments & add test

Co-authored-by: C. Titus Brown <[email protected]>
Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ctb ctb merged commit 97d35e6 into master Jun 20, 2020
@ctb ctb deleted the lca_summarize_weighted branch June 20, 2020 15:29
@ctb
Copy link
Contributor Author

ctb commented Jun 20, 2020

🥳 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants