Cannot read 10x bam file #15

olgabot · 2018-09-06T20:42:54Z

Describe the bug

Cannot read 10x bam file

To Reproduce
Steps to reproduce the behavior:

Using the possorted_genome_bam.bam file from here, the following code fails:

import bamnostic as bs

bam_file = bs.AlignmentFile(
    os.path.join(folder, 'possorted_genome_bam.bam'), mode='rb')

Expected behavior
Should be able to read the file and iterate through alignments

Screenshots

(sourmash) 
 ⚙  Thu  6 Sep - 13:33  ~/code/sourmash   origin ☊ olgabot/10x-singlecell-bam ↑1 8☀ 3● 3‒ 1✚ ⚑ 
  sourmash compute -k 31 --input-is-10x  --force tests/test-data/lung_ptprc
computing signatures for files: tests/test-data/lung_ptprc
Computing signature for ksizes: [31]
Computing only DNA (and not protein) signatures.
Computing a total of 1 signature(s).
Traceback (most recent call last):
  File "/anaconda3/envs/sourmash/bin/sourmash", line 11, in <module>
    load_entry_point('sourmash', 'console_scripts', 'sourmash')()
  File "/Users/olgabot/code/sourmash/sourmash/__main__.py", line 77, in main
    cmd(sys.argv[2:])
  File "/Users/olgabot/code/sourmash/sourmash/commands.py", line 268, in compute
    barcodes, bam_file = read_10x_folder(filename)
  File "/Users/olgabot/code/sourmash/sourmash/tenx.py", line 17, in read_10x_folder
    os.path.join(folder, 'possorted_genome_bam.bam'), mode='rb')
  File "/Users/olgabot/anaconda/envs/sourmash/lib/python3.6/site-packages/bamnostic/core.py", line 153, in __init__
    bgzf.BgzfReader.__init__(self, **kwargs)
  File "/Users/olgabot/anaconda/envs/sourmash/lib/python3.6/site-packages/bamnostic/bgzf.py", line 471, in __init__
    self._load_header(check_sq)
  File "/Users/olgabot/anaconda/envs/sourmash/lib/python3.6/site-packages/bamnostic/bgzf.py", line 652, in _load_header
    self._header = BAMheader(self)
  File "/Users/olgabot/anaconda/envs/sourmash/lib/python3.6/site-packages/bamnostic/bgzf.py", line 311, in __init__
    tag, value = field.split(':')
ValueError: too many values to unpack (expected 2)

Desktop (please complete the following information):

OS: macOS
Python Version: 3.6.5
bamnostic Version: 0.8.11

Additional context
None

The text was updated successfully, but these errors were encountered:

olgabot · 2018-09-06T20:43:47Z

Bamnostic was suggested (sourmash-bio/sourmash#539) as a replacement to pysam but I'm not able to use it for 10x-generated single cell bam files.

betteridiot · 2018-09-06T21:20:35Z

I saw your issue. I thought by the tag, field part that it might have been an unpacking issue in the binary part of the data. The problem was actually with how it parses the SAM header. The fields of the SAM header aren't expected to be 3-field because the field type is only really necessary for the BAM format to figure out how to unpack the tag data.

I fixed this to handle 3-field tags in SAM header though. Thank you for the heads-up.

I made an update to PyPI and conda. Conda might take awhile until it is finalized. If you want the updates now, feel free to just github install it.

betteridiot · 2018-09-06T21:23:27Z

Oops, forgot to quote something. Should be fixed with f7cfb0b

Sorry.

olgabot · 2018-09-06T23:32:01Z

Wow, thank you for the quick fix! --- Olga Botvinnik, PhD olgabotvinnik.com <http://www.olgabotvinnik.com>

…

On Thu, Sep 6, 2018 at 2:23 PM Marcus D Sherman ***@***.***> wrote: Oops, forgot to quote something. Should be fixed with f7cfb0b <f7cfb0b> Sorry. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAxNcJcLuS97nkIkKLPj4lu3MjFMJqfmks5uYZJPgaJpZM4Wdrlq> .

The program was originally looking for the EOF marker, but I forgot that when the EOF marker BGZF block is decompressed, it evaluates to an empty bytestring `b''`. I put in some extra checks so that if an empty block is found, it ensures it is at the file and is a legitimate end of file. If the file is truncated, it will `raise` a slightly different `StopIteration` error stating that the *potential* end of file is reached. I ran this on the file you mentioned in #15 (btw, awesome Issue suggestion. Thank you for following the template). My debugging is properly running through the file now.

betteridiot closed this as completed in 91630a0 Sep 6, 2018

olgabot mentioned this issue Sep 13, 2018

Still cannot read 10x bam file #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot read 10x bam file #15

Cannot read 10x bam file #15

olgabot commented Sep 6, 2018

olgabot commented Sep 6, 2018

betteridiot commented Sep 6, 2018

betteridiot commented Sep 6, 2018

olgabot commented Sep 6, 2018 via email

Cannot read 10x bam file #15

Cannot read 10x bam file #15

Comments

olgabot commented Sep 6, 2018

olgabot commented Sep 6, 2018

betteridiot commented Sep 6, 2018

betteridiot commented Sep 6, 2018

olgabot commented Sep 6, 2018 via email