Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pysam doesn't support long cigars #613

Closed
nspies opened this issue Jan 31, 2018 · 9 comments
Closed

pysam doesn't support long cigars #613

nspies opened this issue Jan 31, 2018 · 9 comments

Comments

@nspies
Copy link
Contributor

nspies commented Jan 31, 2018

pysam doesn't properly read cigars from a sam or cram file if they're longer than 2^16.

@nspies nspies changed the title cram files don't support long cigars pysam doesn't support long cigars Jan 31, 2018
@nspies
Copy link
Contributor Author

nspies commented Jan 31, 2018

See also samtools/hts-specs#40

@dpryan79
Copy link
Contributor

I imagine that this will end up being related to #608, since that is what allows for all three file types to hold more than 2^16 operations.

@nspies
Copy link
Contributor Author

nspies commented Jan 31, 2018

This seems to be a pysam issue, rather than an htslib issue, as samtools and htslib have supported longer cigars in sam/cram for some time -- it's just bam that now has the support using the tag workaround.

I can confirm that the pysam htslib-1.7 branch does not fix this issue.

@AndreasHeger
Copy link
Contributor

Thanks for reporting, I will look into this.

@AndreasHeger
Copy link
Contributor

Hi,
I think I have tracked the issue towards using a uint16_t instead of uint32_t. I have added a simple test case and according to this it should work.

@nspies
Copy link
Contributor Author

nspies commented Feb 3, 2018

Thanks for the quick work.

I'm a little confused about this, but I don't think it's quite working yet. It seems to save the cigar to file properly if you create a long read and long cigar in pysam. However, neither cigartuples nor cigarstring report the correct lengths in pysam, either when read from the file or when created in memory. Hence, I think your tests seem to work but the lengths still aren't correct.

I would add an explicit check for len(read.cigartuples) == (self.read_length*2 + 1) or:

    def build_read(self):
        ...
        a.mapping_quality = 20
        cigarstring = "M1D1" * l + "M1"
        a.cigarstring = cigarstring
        assert cigarstring == a.cigarstring
        a.next_reference_id = 0
        ...

@nspies
Copy link
Contributor Author

nspies commented Feb 3, 2018

Ah, well, here's the fix:

update pysam_get_n_cigar to use uint32_t in htslib_util.h and libcalignedsegment.pxd

@AndreasHeger
Copy link
Contributor

Thanks, will do.

@AndreasHeger
Copy link
Contributor

Thanks, it should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants