Skip to content

Commit

Permalink
Pack more records in memory when sorting
Browse files Browse the repository at this point in the history
As bcf1_t is quite a big structure, it adds quite a lot of
overhead if the records being sorted are small (e.g. single sample
gVCF).  This overhead can be reduced by storing the data in a
more compact form.  Variable-length encoding is used for numbers
that aren't directly needed for sorting as values are usually
much smaller than the maximum possible.  On a test file with
approx. 61 characters per VCF line, up to four times as many
records could be stored before having to spill them.

This change only affects the blocks of data sorted in memory
and then written out by buf_flush().  As the merge_blocks()
function writes bcf and needs far fewer records in memory at
any time, partially merged files are still written in that
format.
  • Loading branch information
daviesrob authored and pd3 committed Aug 15, 2024
1 parent a62defa commit f6ac1c2
Show file tree
Hide file tree
Showing 2 changed files with 313 additions and 101 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ vcfroh.o: vcfroh.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_kstrin
vcfcnv.o: vcfcnv.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_kstring_h) $(htslib_kfunc_h) $(htslib_khash_str2int_h) $(bcftools_h) HMM.h rbuf.h
vcfhead.o: vcfhead.c $(htslib_kstring_h) $(htslib_vcf_h) $(bcftools_h)
vcfsom.o: vcfsom.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_hts_os_h) $(bcftools_h)
vcfsort.o: vcfsort.c $(htslib_vcf_h) $(htslib_kstring_h) $(htslib_hts_os_h) kheap.h $(bcftools_h)
vcfsort.o: vcfsort.c $(htslib_vcf_h) $(htslib_kstring_h) $(htslib_hts_os_h) $(htslib_bgzf_h) kheap.h $(bcftools_h)
vcfstats.o: vcfstats.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_faidx_h) $(bcftools_h) $(filter_h) bin.h dist.h
vcfview.o: vcfview.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(bcftools_h) $(filter_h) $(htslib_khash_str2int_h) $(htslib_kbitset_h)
reheader.o: reheader.c $(htslib_vcf_h) $(htslib_bgzf_h) $(htslib_tbx_h) $(htslib_kseq_h) $(htslib_thread_pool_h) $(htslib_faidx_h) $(htslib_khash_str2int_h) $(bcftools_h) $(khash_str2str_h)
Expand Down
Loading

0 comments on commit f6ac1c2

Please sign in to comment.