Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --write-index work for concat with vcf #1839

Closed
wants to merge 2 commits into from

Conversation

whitwham
Copy link
Contributor

@whitwham whitwham commented Sep 18, 2024

Add a new function to work like vcf_write_line but with index handling code.

To make vcf_write_line work to produce an index I would of had to change its arguments as more information was needed. Instead I added a new function.

In conjunction with the BCFtools change (samtools/bcftools#2283) this should fix samtools/bcftools#2246.

@jkbonfield
Copy link
Contributor

I understand this is a derivation of vcf_write_line, but there is code in vcf_write that would be useful to lift over too. Specifically when writing and indexing it does:

        if (bgzf_flush_try(fp->fp.bgzf, fp->line.l) < 0)
            return -1;
        if (fp->idx && !fp->fp.bgzf->mt)
            hts_idx_amend_last(fp->idx, bgzf_tell(fp->fp.bgzf));
        ret = bgzf_write(fp->fp.bgzf, fp->line.s, fp->line.l);

The bgzf_flush_try statement is there to ensure that if the current line does not entirely fit within the current bgzf block then we flush and create a new one, which can help certain types of random access (not that we exploit this at all at present). I think this should have been there in the old vcf_write_line function too, and is likely something I missed in d5a00db. I'm not sure if we have a debugging tool able to find locations where bgzf entries split blocks, but it would be useful to validate it.

The hts_idx_amend_last bit is there so that the on-the-fly index matches the index created by a bcftools index command where the end offset of the last record is set to byte zero into the new block rather than byte in the old block.

Comment on lines +528 to +529
@param beg Beginning position
@param end End position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters do not match the function declaration. It has pos & len while this has beg & end. Also we should probably list them in the same order, so move "line" down after the header.

@whitwham
Copy link
Contributor Author

I'm going to go with a different option so this PR is no longer needed.

@whitwham whitwham closed this Sep 30, 2024
@whitwham whitwham deleted the concat_index branch September 30, 2024 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write-index does not appear to work from concat command
2 participants