Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use estimated counts instead of TPM #170

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Conversation

balajtimate
Copy link
Collaborator

@balajtimate balajtimate commented Oct 24, 2024

Description

  • Use 'estimated counts' of transcript abundance from Kallisto (instead of TPM) for library source inference
  • Update transcripts.fasta.gz to include additional organisms and transcripts from orthologues, which were previously removed (now it includes transcripts from 387 organisms)
  • Repetitive sequences in transcripts have been masked using RepeatMasker

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

Please carefully read these items and tick them off if the statements are true
or do not apply.

  • I have performed a self-review of my own code
  • My code follows the existing coding style, lints and generates no new
    warnings
  • I have added type annotations to all function/method signatures, and I
    have added type annotations for any local variables that are non-trivial,
    potentially ambiguous or might otherwise benefit from explicit typing.
  • I have commented my code in hard-to-understand areas
  • I have added ["Google-style docstrings"] to all new modules, classes,
    methods/functions or updated previously existing ones
  • I have added tests that prove my fix is effective or that my feature
    works
  • New and existing unit tests pass locally with my changes and I have not
    reduced the code coverage relative to the previous state
  • I have updated any sections of the app's documentation that are affected
    by the proposed changes

If for some reason you are unable to tick off all boxes, please leave a
comment explaining the issue you are facing so that we can work on it
together.

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (a081c35) to head (a536351).

Additional details and impacted files
@@            Coverage Diff            @@
##               dev      #170   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           13        13           
  Lines         1164      1164           
=========================================
  Hits          1164      1164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +258 to +259
`est_counts`, signifying the percentages of total expression
per read source. The data frame is sorted by total expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is accurate to speak of "expression" when we use counts. Perhaps best to refer to "total estimated counts".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants