Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pack more records in memory when sorting
As bcf1_t is quite a big structure, it adds quite a lot of overhead if the records being sorted are small (e.g. single sample gVCF). This overhead can be reduced by storing the data in a more compact form. Variable-length encoding is used for numbers that aren't directly needed for sorting as values are usually much smaller than the maximum possible. On a test file with approx. 61 characters per VCF line, up to four times as many records could be stored before having to spill them. This change only affects the blocks of data sorted in memory and then written out by buf_flush(). As the merge_blocks() function writes bcf and needs far fewer records in memory at any time, partially merged files are still written in that format.
- Loading branch information