Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gathering contig information from CSI files #611

Closed
mliezun opened this issue Dec 2, 2021 · 3 comments
Closed

Gathering contig information from CSI files #611

mliezun opened this issue Dec 2, 2021 · 3 comments

Comments

@mliezun
Copy link

mliezun commented Dec 2, 2021

We have been trying to implement a tool for reading portions of a VCF file using a CSI index. We want to implement something like -r chr:beg-end|chr, but we see that there is no contig information in the CSI file as per this spec https://samtools.github.io/hts-specs/CSIv1.pdf

Could you help me figure this out? Or point me in the right direction?

@jmarshall
Copy link
Member

It is a long-standing problem (see e.g. #70) that the index formats are under-documented.

In particular, the contents of l_aux/aux (“Auxiliary data”) are not described by CSIv1.pdf.

In practice, CSI — at least, as used by htslib/bcftools to index a BGZF-compressed VCF file — does store contig name information within this auxiliary data. You can see it if you view the index file with gunzip -c foo.vcf.gz.csi | od -c | head. See also vcf_idx_init() and hts_idx_tbi_name() (the latter's comments may be misleading) in the HTSlib source code.

When CSI is used by htslib/bcftools to index a BGZF-compressed VCF file, this aux block contains the formatnames fields described in tabix.pdf.

@mliezun
Copy link
Author

mliezun commented Dec 2, 2021

Thank you John, I will take a closer look to the tabix spec and the source code to see if we can solve this

@jkbonfield
Copy link
Contributor

Closing as duplicate of #70.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants