-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate signatures from 10x single cell bam file #539
Generate signatures from 10x single cell bam file #539
Conversation
Hi @olgabot! This is pretty cool =] I created a PR for your branch to fix the tests and avoid a direct dependency on pysam. When you write the tests for Another thing: did you try bamnostic? It's a pure Python SAM/BAM reader, and avoids problems like pysam not working on Python 3.7 at the moment... |
Glad you like it! @jamestwebber helped me get started with the code :)
Great! I'll add that change.
I hadn't heard of |
almost there, we are missing an |
Cool, tests passing! \o/ On the performance side, you're building one minhash per barcode, and then saving all the barcodes on a signature: |
Yay passing tests!! I can work on the multiprocessing in a separate PR so
this functionality is at least there for people willing to wait, hah.
---
Olga Botvinnik, PhD
olgabotvinnik.com <http://www.olgabotvinnik.com>
…On Tue, Sep 4, 2018 at 5:30 PM Luiz Irber ***@***.***> wrote:
Cool, tests passing! \o/
On the performance side, you're building one minhash per barcode, and then
saving all the barcodes on a signature:
https://github.com/dib-lab/sourmash/blob/ef25bcf36a2db9c55ef6b5812abd6d2128c5ad8c/sourmash/commands.py#L250L266
I think you can rewrite this block and use a multiprocessing.pool
<https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool>
to process them in parallel, but I wouldn't necessarily require that for
accepting this PR
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#539 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAxNcL7aI4HaBrFmHiJuf_RkM6fnBuxVks5uXxsdgaJpZM4WWWHE>
.
|
Hello! I'm trying to figure out whether it's the iterating over the alignment file or creating the hashes that's taking the most time but I can't seem to get the profiler working due to the relative imports in from . import DEFAULT_SEED, MinHash, load_sbt_index, create_sbt_index Here's the full stack trace below:
Do you know what may be happening? |
omg nevermind .. somehow I moved the
-_-;;;;; 🤦♀️ |
Codecov Report
@@ Coverage Diff @@
## master #539 +/- ##
=========================================
+ Coverage 90.76% 90.8% +0.04%
=========================================
Files 33 34 +1
Lines 5011 5047 +36
Branches 36 36
=========================================
+ Hits 4548 4583 +35
- Misses 463 464 +1
Continue to review full report at Codecov.
|
BTW, bamnostic seems to not be able to deal with 10x bam files, which have 3-field tags like this:
|
I see, thanks for creating an issue in their repo! Another profiling approach you can take is https://github.com/benfred/py-spy (saw it yesterday, tested and it works pretty well!) |
Update from me: I'm migrating to bamnostic because pysam alignment objects cannot be pickled. But there's still issues with reading the 10x bam files: betteridiot/bamnostic#16 |
Alright, the |
ahh I spoke too soon... fixing now |
Now this seems to be failing due to a deprecation error that happens for every read. Hopefully fixed by: betteridiot/bamnostic#18 |
Ugh Python2.7 gets in the way of everything!! @betteridiot is working on fixing a |
Codecov Report
@@ Coverage Diff @@
## master #539 +/- ##
==========================================
- Coverage 90.78% 90.62% -0.17%
==========================================
Files 33 34 +1
Lines 5016 5057 +41
Branches 37 37
==========================================
+ Hits 4554 4583 +29
- Misses 462 474 +12
Continue to review full report at Codecov.
|
The build is passing now! Though it looks like I didn't cover everything with the tests.. let me know if you want me to add more. |
Hi @olgabot! This is looking great =] Two things:
|
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?