Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large contigs missing from AlignmentHeader.lengths #741

Closed
mvdbeek opened this issue Nov 11, 2018 · 5 comments
Closed

Large contigs missing from AlignmentHeader.lengths #741

mvdbeek opened this issue Nov 11, 2018 · 5 comments

Comments

@mvdbeek
Copy link
Contributor

mvdbeek commented Nov 11, 2018

This happens for instance for the following file:

@HD	VN:1.3	SO:coordinate
@SQ	SN:GTATTCTTACTCCATAAACACATAGGCTTGGTCCTAGCCTTTTTATTAGT	LN:2147483648
@SQ	SN:CTAAATCACGTCTCTACGATTAAAAGGAGCAGGTATCAAGCACACTAGAA	LN:50
HWI-EAS91_1_30788AAXX:1:1:1491:637	20	*	10864	25	36M	*	0	0	TGTAGAAGCCCCAATTGCCGGATCCATNNTGCTAGC	DBAIIIIIIIIIIIFIIIIIIIIIIII""IIIIIII	NM:i:1	X1:i:1	MD:Z:7N0N27
HWI-EAS91_1_30788AAXX:1:1:1630:59	20	*	12387	25	36M	*	0	0	TCATACTCGACCCCAACCTTACCAACCNNCCGCTCC	FIIHII;IIIIIIIIIIIIIIIIIIII""IIIIIII	NM:i:1	X1:i:1	MD:Z:7N0N27
HWI-EAS91_1_30788AAXX:1:1:1218:141	20	*	14062	25	36M	*	0	0	ACAAAACTAACAACAAAAATAACACTCNNAATAAAC	I+IIII1IIIIIIIIIIIIIIIIIIII""IIIIIII	NM:i:1	X1:i:1	MD:Z:7N0N27

If I load this up in pysam I only see the second, short contig in f.header.lengths, while f.header.to_dict() preserves the large contig.
I suspect this might be a htslib issue, as target_len is defined as uint32_t. I just tried changing all those instances to uint64_t, but I don't know enough to to figure out if/what I missed (fea8c81)

@SergejN
Copy link

SergejN commented Nov 15, 2018

Could it be related to this bug? #732

@mvdbeek
Copy link
Contributor Author

mvdbeek commented Jan 18, 2019

Sorry, a1b5cef didn't fix it, could you please re-open the issue ?

In [1]: import pysam

In [2]: af = pysam.AlignmentFile('large_file.bam')

In [3]: af.header.lengths
Out[3]: (50,)

In [4]: pysam.__version__
Out[4]: '0.15.2'

@mvdbeek
Copy link
Contributor Author

mvdbeek commented Jan 18, 2019

ah, well, doing this with 2147483647 is fine, so that should realistically be large enough. Thanks!

@SergejN
Copy link

SergejN commented Jan 18, 2019 via email

@mp15
Copy link

mp15 commented Jan 18, 2019

There is not a simple fix because BAM binary format defines pos as an int32_t (https://github.com/samtools/hts-specs/blob/master/SAMv1.pdf). We did consider a BAMv2 but this would take some time for the ecosystem to adopt and thus people were resistant (samtools/hts-specs#240). HTSlib currently uses the BAM format as an interface and thus is also limited. Whilst I know the HTSlib team have been conversing about this I'm not sure if we've actually reached a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants