Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bcftools index with BAI. #812

Closed
jkbonfield opened this issue Jun 14, 2018 · 7 comments
Closed

Bcftools index with BAI. #812

jkbonfield opened this issue Jun 14, 2018 · 7 comments

Comments

@jkbonfield
Copy link
Contributor

jkbonfield commented Jun 14, 2018

The bcftools index help states that CSI is the default and that -c enables CSI, which makes me wonder A) why the option exists and B) how you can get anything other than CSI. The code for -c just sets tbi to 0.

Is there no support for BAI in bcftools? Given (I think) htsjdk doesn't support CSI, how does this work with sharing data between htsjdk and Bcftools? Is it simply that everyone uses TBI indices and VCF? (I'm getting lost at this stage in a twisty maze of index formats, all alike.)

My purpose in the question is I'm adding code to htslib to generate BAI and CSI (so far) indices on the fly while writing BAM and BCF, but I cannot validate my bcf.bai output because I cannot figure out how to generate a legit one.

@mcshane
Copy link
Contributor

mcshane commented Jun 14, 2018

I have never seen anywhere a vcf.gz.bai or bcf.bai file before.

@jkbonfield
Copy link
Contributor Author

Ah ok, so I'll ditch my new logic that writes them.

I assume CSI predates BCF then (but not BAM, which is why we don't use it there by default)?

@mcshane
Copy link
Contributor

mcshane commented Jun 14, 2018

I assume CSI predates BCF then (but not BAM, which is why we don't use it there by default)?

I think that is right.

@lh3
Copy link
Member

lh3 commented Jun 14, 2018

CSI was designed for BCFv2. They were implemented around the same time in 2012. BAI was described in the very first SAM spec released in 12/2008.

@lh3
Copy link
Member

lh3 commented Jun 14, 2018

Given (I think) htsjdk doesn't support CSI, how does this work with sharing data between htsjdk and Bcftools?

htsjdk doesn't work with BGZF'd BCFv2, I believe. Probably it doesn't support new features in BCFv2.2, either.

@jmarshall
Copy link
Member

Relevant HTSJDK issues are samtools/htsjdk#447 (and PR samtools/htsjdk#1040), samtools/htsjdk#946, and samtools/htsjdk#628.

@jkbonfield
Copy link
Contributor Author

If if htsjdk supports BCFv1, then what does it use as an index given it also doesn't yet have CSI support. Do they have yet another index format? (We already have 4 for bgzf, I hope there's not a 5th!)

Anyway, I've made my code compatible with the rest of bcftools and gone CSI only. I'm closing this as it's now just a discussion (and it was really just a question - the only "issue" could be documentation perhaps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants