Skip to content

Releases: samtools/htsjdk

2.21.0

06 Nov 20:20
2.21.0
03e9a32
Compare
Choose a tag to compare

This is a minor release which includes new features and bug fixes:

Compatibility Notes:

This release makes several breaking changes. The changes are in lightly used APIs and shouldn't cause problems for most users. We believe it will be easy to adapt

The changes were necessary to remove outdated and unnecessary reflection code which was causing problems in java 11.

The following public methods/field were removed and their functionality has been replaced by the listed methods:

  • IndexFactory.IndexType.getIndexCreator() and IndexFactory.IndexType.getIndexType() is replaced with IndexFactory.IndexType.createIndex(BufferedInputStream)
  • ParsingUtils.registerHelperClass(Class helperClass) and ParsingUtils.urlHelperClass is replaced with setURLHelperFactory(URLHelperFactory)

New Features:

Merging Sharded Indexes:

There is new support for directly merging indexes of sharded bam/cram/vcf files. This allows tools which process parts of a large file in parallel to write their output in shards, and include an index for each shard. These indexes can then be cheaply merged when combining the shards of the file into a single large file in order to create a global index without re-indexing the entire file.

This was developed as part of disq for use with spark but is likely to be useful for any pipeline which process bam/cram/ or vcf in parallel and then needs to combine the results. The given implementation is designed to work with the sharding/naming patterns that disq uses, but can likely be extended to work with other sharding/naming schemes. Htsjdk does not currently provide a simple mechanism for writing files in the sharded format but we would like to support this in future releases. See BAMMergerTest for an example of the sharding scheme which is supported by the provided IndexMergers.

IndexFactory Path Support:

Index factory now supports Path inputs in addition to Files.

Complete Change List:

03e9a32 Add back a method which was accidentally replaced (#1433)
a83f2b3 Path-enable IndexFactory. (#1430)
c21055b Fixes indexing for codecs that implement getPathToDataFile (#1429)
f70befd Replace IndexFactory reflection. (#1431)
3f8ba35 Support for Index merging (#1263)
c33b695 Add a URLHelper factory API to ParsingUtils (#1421)
4d73aff Make VCFHeader not throw exception if contig header lines lack length field (#1418)
e357c42 Add a method to conditionally apply a Consumer on a nullable input (#1419)
e1bbd34 Fixing bad behavior in IOUtil.deletePaths (#1416)

2.20.3

26 Aug 21:19
2.20.3
Compare
Choose a tag to compare

This is a small release which includes minor improvements and fixes.

Compatibility Note

This version has a minor binary incompatibility with 2.20.2 but software; in order to take advantage of the larger insertion size allowed by #1410 downstream software may need to be recompiled against this new version of htsjdk. This has to do with a quirk of how static constants are built into the jar.

46b1a00 Add LiftOver constructor that takes an input stream (#1412)
9628c1e Added alignment type to RecordAndOffset class (#1411)
2a6e2c2 Updated the max insertion size to be Integer.MAX_VALUE. (#1410)
a798e4e Suppress all exceptions during DeleteOnExitPathHook (#1409)

2.20.2

14 Aug 20:46
2.20.2
Compare
Choose a tag to compare

This is a bug fix release which fixes a regression in 2.20.0 that caused problems when using Kryo serialization to serialize AbstractFastaSequenceFile.

Changelist:

9401637 Make AbstractFastaSequenceFile serializable by Kryo for Spark. (#1408)
cea307e Added option to log with suppliers (#1406)

2.20.1

05 Aug 21:27
2.20.1
Compare
Choose a tag to compare

This is a bug fix release which fixes #1404, a minor regression introduced in 2.20.0

833a0e2 fix bug in VariantContextBuilder (#1403)

2.20.0

26 Jul 21:25
1d4a316
Compare
Choose a tag to compare

This release includes significant new features and numerous bug fixes. We recommend that users should upgrade to this version.

Compatibility Notes:

There are a few minor incompatibilities between this and 2.19.0. Most users should have no problems upgrading.
Incompatible changes:

  • There continue to be major changes to the unstable htsjdk.samtools.cram package.
  • BCFVersion became final invalidating any subclasses.
  • There were several changes to methods in the class CRAMBAIIndexer

New features:

Support Reading VCF4.3

Htsjdk can now read version 4.3 vcfs produced by other software. It can still only produce version 4.2. (#1381, #1359)

Write Compressed References:

You can now write indexes fasta.gz files with FastaReferenceWriterBuilder (#1340)

IntervalList improvements:

  • There's a new IntervalListCodec which lets you parse .interval_list files using tribble (#1327)
  • Improved performance of some interval operations when operating on large IntervalLists (#1356, #1384)

Developer documentation improvements:

  • We added a Code of Conduct to explicitly state our goals of being a friendly, safe, and professional project where we hope anyone will feel comfortable contributing. (#1390)
  • Added style guides for Intellij and Eclipse. Running automated style checking of newly submitted or edited code is encouraged.(#1386, #1391)
  • Added a description of how to interpret the version number and what it means for compatibility (#1392)
  • Updating issue and PR templates. (#1393)
  • We now run Spotbugs as part of our build to identify and reject certain classes of bugs automatically. This is still new and we may continue to experiment with which rules to apply. (#1330, #1331)

Other Improvements:

Complete change list:

1d4a316 Make VariantContextBuilder safer (#1344)
45cfc08 Handle PEDIGREE header lines differently for vcf4.2 vs vcf4.3. (#1401)
c5ed6b7 Insertions accumulating in AbstractLocusIterator bugfix (#903)
edb8c60 Optimize AbstractBAMFileIndex.query() when querying sequentially (#1397)
9aa81ed Add documentation for VariantContext.getStart() regarding telomeric events (#1369)
83da7e7 Make Validation error emit the validation type that was violated (#1395)
3a35b89 CRAM queryAlignmentStart/queryMate fix. (#1164)
765728e Moved file extensions constants to their own class (#1382)
16e9cca Output missing vcf fields as a single . (#1389)
387de12 Adding a code of conduct document for htsjdk. (#1390)
ea1db70 adding README section about style guides (#1391)
a58f432 Describing what our version number means (#1392)
faf3d11 Updating issue and PR templates (#1393)
37ea940 Tolerate mixed case NaNs and Infinities in VCF (#1364)
f68108d Add style guide for IDE's (#1386)
b0decef Substitution matrix refactoring and tests (#1374)
a61e233 Add a function to calculate the value of the OA tag (#1354)
f13d075 Make VariantContextWriterBuilder warn when indexing-on-the-fly is enabled for streams (#1328)
b804a47 VCF codec should handle multiple missing GL fields (#1372)
58199e6 Prevent integer overflow in Interval countBases method. (#1384)
adea7d1 Remove requirement that zero length reads need color space or flow (#1360)
6df21e5 Update readme with info about reading vcf 4.3 (#1381)
d5ac863 Support VCF v4.3 (read only). (#1359)
1e3f1fa Adding a few more semi-colons in shell in travis config. (#1378)
2b83a9f Fix to ensure we only publish snapshots compiled with openjdk-8. (#1376)
1a275e1 Added fasta.gz write support to FastaReferenceWriterBuilder (#1340)
4ae8508 Moved loading of sequence dictionary into an overideable method (#1362)
64e98d6 Removed redundant readability test in unrollPaths method (#1355)
e2c0fdd Support writing a CRAI index from CRAMContainerStreamWriter (#1351)
4747d08 Change SAMTextHeaderCodec to no longer accumulate the entire text of … (#1361)
3b0dd60 SubstitutionMatrix class cleanup - part 1. (#1366)
5f0e045 Small CRAM refactor: common ExternalEncoding Abstract Base Class (#1346)
ae49710 Make interval operations scale better (#1356)
7ce3636 Added name of offending record to error message in SamPairUtil (#1358)
0b9fe0d Remove CramCompressionRecord.tagIds (#1345)
4f62add Allow BCFCodec subclasses to provide custom version compatibility. (#1352)
335f2c1 Improve exception message for unset VCF output type (#1357)
5442f78 CRAM: revert #1326 and fix tests and comments (#1341)
1ece54c Add interval list codec (#1327)
aa89809 Remove multiple versions of Slice/Container getCRAIEntries() (#1329)
9f5d86e Progress Logger prints read names when iterating in queryname order (#1302)
16fecfc Change SortingCollection log statements to DEBUG (#1334)
a82d8ba Test ContainerIO.calculateSliceOffsetsAndSizes() and fix the slice size calculation (#1326)
7925166 Fix or ignore remaining SpotBugs issues (#1331)
f9361ac Fix issues in tests identified by Spotbugs (#1330)
5dcfd73 Fix signing task in build.gradle (#1325)
2b5f3bc Reject BCF files when minor version doesn't match the current implementation. (#1324)

2.19.0

18 Mar 19:58
2.19.0
019465f
Compare
Choose a tag to compare

This release contains significant new features and many bug fixes.

Compatibility Notes:

This release is neither binary or source backwards compatible with 2.18.2 but upgrading should be painless for the majority of users. There may be some minor source changes necessary when recompiling against 2.19.0. If you encounter difficulties please contact us.

Binary/Source Compatibility issues:

  • The method SamRecord.getIndexingBin() has been removed. It was a cache of a bam format specific field which could often get out of sync with correct value. Users with their own subclasses of SamRecord may need to make changes. Uses of getIndexingBin() can be replaced by computeIndexingBin().
  • Various static fields have been made final. These fields are not intended to be update by user code.
  • There continues to be significant changes in the unstable htsjdk.samtools.cram cram package.
  • Signature change in SAMSequenceDictionaryCodec:
    SAMSequenceDictionaryCodec(BufferedWriter) -> SAMSequenceDictionaryCodec(Writer)
  • Signature change: AbstractBAMFileIndex.position() returns a long instead of an int now.
  • Removed throws clause from methods:
    • CRAMContainerStreamWriter.flushContainer( )
    • SequenceUtil.calculateMD5String
    • multiple overloads of `CRAMIterator.CRAMIterator

Changes in behavior:

  • IntervalList.fromFiles() no longer calls unique() on the returned interval list. (#1273)

New Features:

CSI index Support:

We can now use CSI indexes generated by other tools. (#1040, #1314, #447)

Fasta Writing:

We can now write fasta files with FastaReferenceWriter (#1172, #1285)

Java 11 support:

We now build and test with Java 11. This is fairly experimental and our downstream acceptance tests haven't been run on 11. We still target java 8 so no new features of 9/10/11 can be used in code yet. (#1291)
IntervalList improvements: Reduced in memory size of IntervalList and OverlapDetector and added support for read/writing them into sorting collections / from Paths.

IntervalList Improvements:

  • New support for writing/reading large interval lists from on disk SortingCollection. (#1288)
  • Reduced memory footprint IntervalList and OverlapDetector. (#1309)
  • IntervalLists can now be written to Paths. #1297 (#1298)
  • IntervalList.fromFiles() no longer calls unique on the returned interval list. (#1273)

CRAM Improvements:

Many other bug fixes and minor additions.

Complete Change List:

019465f Adding back a removed method (#1321)
fe27e66 Fixing zero-length interval bug in IntervalList.merge (#1318)
b321d91 Fix CSI bug when querying files with long references. (#1314)
dd313de Deprecate TestUtil.deleteRecursively (#1315)
68199fb Revert singletonList optimization in SamRecordSetBuilder (#1317)
d678af3 Fix bug in BlockCompressedInputStream.checkTermination() (#1310)
7b3c7a6 Fix bug when loading indexed bgzip fasta file. (#1311)
9f84b7b Optimizations to reduce in-memory size of IntervalList and OverlapDetector (#1309)
d25bbb4 Add ability to generate reference from SamRecordSetBuilder (#1286)
1509dcc Consolidate common code into CRAMStructureTestUtil: (#1312)
b1cb410 Fix toFile call that prevents IntervalListWriter from writing to Paths (#1298)
37b2e87 CRAM: formalize Slice Type with an enum (#1274)
b58a5a9 CRAM: Only calculate alignmentDelta as needed for records (#1304)
62388a2 A few fixes for issues found by spotbugs (#1278)
0b16296 Simplify CRAM sequence dictionary extractor to not require a fake reference. (#1308)
a6c5837 remove caching of alleles in AbstractVCFCodec (#1282)
2d2922f Load BlockCompressedIndexedFastaSequenceFile and GZIIndex from streams (#1259)
205d5f0 Add support for CRAM in SamSequenceDictionaryExtractor (#1305)
d6043f0 Changing the team name/url in the maven pom (#1300)
0cc762f Changed IntervalList fromFiles() so that it doesn't call .unique() on… (#1273)
3b3d107 adding new overloads to IOUtils with Path for some file only methods (#1296)
5e8b1fa Update java version information in the README (#1299)
061e217 update CramCompressionRecord.isPlaced() to make it APDelta-aware (#1284)
efe4abf Remove redundant and unused BAM_FLAGS from CRAM code. (#1292)
e8e0a6f Add an IntervalCodec that use useful for sorting large sets of Intervals (#1288)
d771b30 rm CramHeader.clone() (#1283)
c0642fb Build and test on Java 11 (#1291)
4c8dfbd Changes to FastaReferenceWriter (#1285)
38bfe65 moving HttpUtilsTest to externalApi test task (#1289)
62fc0b1 Use Integer.parseInt over Integer.valueOf to avoid unnecessary boxing (#1275)
77b3b8f Fix fields that should be final, as reported by SpotBugs (#1268)
52169ed Moving the SRA tests to a separate env (#1272)
3ae552f Move bad coordinates check (#911)
942e3d6 Ported GATK's FastaReferenceWriter (#1172)
e86af96 makeSAMOrBAMWriter accept only .sam/.bam (#834)
c189878 CRAM: Container and Slice states (#1266)
5ef9223 Add support for Sam Header Readgroup Barcode field (#1210)
3c48018 Fix bug in IntervalList.getUniqueIntervals that caused missing interval names (#1265)
d0b1a74 Adding some constants due to additions to hts-spec (#1241)
93250d5 Parse valid VCF 4.2 with ##INFO containing Source + Version (#1248)
5217fe4 Misc CRAM cleanup (#1253)
16a4e37 Support for reading CSI indexes for BAM files (#1040)
15ec7da Only compute BIN on BAM write and on index building (#1258)
94f0967 Immutable CRAIEntry (#1256)

2.18.2

16 Jan 22:24
2.18.2
Compare
Choose a tag to compare

This is a small release which includes bug fixes and some new methods and performance improvements.

Compatibility notes:

This is compatible with 2.18.1 with the exception of the unstable htsjdk.samtools.cram package which continues to undergo major changes.

Disallowed Contig Names:

The following characters have been disallowed from being included in contig names: \ , "'`` () <> [] {} (#1238)

These are being disallowed in future versions of the SAM spec because they can introduce parsing ambiguities. We've chosen to disallow them in all versions of files read or produced by htsjdk going forward which is stricter than the spec. A scan of a large number of references found that they were used in only a tiny fraction of references. We believe this shouldn't cause users problems since these characters do not appear in common references, if we're wrong and this change causes you problems please get in touch and let us know.

Deprecations:

The classes in the htsjdk.samtools.apps package are deprecated and will be removed in the future.

Complete Change list

68d9fdb Fix to ensure that SeekableStream#available() never returns a negative value. (#1255)
5a0ab68 Fix nitpick in IntervalKeepPairFilterTest (#1254)
2c97cce IntervalKeepPairFilter now filters out single ended reads (#1252)
2737292 Deprecated htsjdk.samtools.apps package. (#1250)
0bf5ff6 Add getters to VariantContextBuilder (#1247)
a4b7da8 resolving some nitpicks from pr #1245 (#1246)
2473407 Added ability to get a VariantContexts from an InputStream (#1245)
4a7cb03 Adding some sanity checking to IntervalList reading. (#1230)
28dde96 Add support for reading and writing splitting BAM index files. (#1138)
37f0789 HttpUtils: set Method to HEAD for HttpUtils.getHeaders (#1191)
fb2f42c Add finals for issues raised in review of PR 1091. (#1240)
8cc1e37 Disallowing bad characters in SamRecord names (#1238)
6ac7a60 Change CRAM validation error reporting granularity from container to record (#1091)
1126e5c CRAM: Refactor Block hierarchy (#1231)
d2360ff Replaced several for-each loops in VariantContext.Make() based on HaplotypeCaller profiling (#1234)

2.18.1

04 Dec 18:24
2.18.1
Compare
Choose a tag to compare

A small release which includes a fix for a cram bug (#1233) introduced in 2.17.0 by (#1199)

This is compatible with 2.18.0 with the exception of the htsjdk.samtools.cram package which as had major changes.

41c4634 Add isRefOnlyBlock function... (#1215)
4ff5190 Relax Beta Encoding requirement to allow 0 bits (#1233)
c596e6b CRAM: Refactor Encodings and Codecs (#1224)

2.18.0

16 Nov 22:23
2.18.0
698a4c3
Compare
Choose a tag to compare

This is a smallish release with some new features. The biggest change is that bams created by htsjdk are now version 1.6. This should really have happened when started producing bams with long CIGAR support (CG tag) since that's the marquee 1.6 feature.

Compatibility:

This release is not backwards compatible with 2.17.0 but upgrading should be easy for most users.

Incompatible changes

  • Delete long deprecated FixBAMFile and SAMTools classes (#1213, #955, #972)
  • There continue to be major changes to the to htsjdk.samtools.cram package.

New Deprecations:

  • Deprecating SAMTagUtils, use SAMTag instead.
  • Deprecating tags in SAMTag that are deprecated in the SAM spec. These are deprecated to discourage use, but will not be removed.

New features and Changes

Sam 1.6 support:

  • Bam output now is designated version 1.6. (#1211)
  • Update the list of tags in SAMTag to include all reserved tags:
  • Deprecating tags that are deprecated in the sam spec, and deprecating SAMTagUtil, it's methods have been merged into SAMTag (#1208, #1227)
  • Added first class Description field to SAMSequenceRecord (#1209)
  • Deprecate SQTagUtil (#1214)

Known issue:

  • non-ascii unicode characters are not supported even though these are now allowed in certain fields. (#1202)

Addition of method to FeatureCodec

  • new method FeatureCodec.getPathToDataFile(String path)
    This allows a new class of codecs that identify a different file as readable from the one that actually contains the data. This has a default implementation and should require no changes from downstream users unless they have custom implementations of FeatureReader that do not go through AbstractFeatureReader (#1223)

More CRAM work:

Work on refactoring and improving the cram code continues.

  • Add hashCode() to classes with equals() and clean up equals() (#1222)
  • DataSeries/DataReader/DataWriter refactor (#1219, #453)
  • Be explicit about spec IDs instead of using ordinal() (#1221)
  • More encoding tests + updates (#1203)

Other Updates

Removing cruft leftover from JAXB (#1207)
Update the README with working test examples (#1225, #984)
Getter for generic fields in VCFSimpleHeaderLine (#1212)

2.17.0

31 Oct 17:17
2.17.0
Compare
Choose a tag to compare

This is a small release that includes a number of bug fixes and minor enhancements.

Compatibility Notes:

This release is not backwards compatible with 2.16.1 due to changes in the cram code. We believe most users should be able to upgrade without issue.

htsjdk.samtools.cram package instability:

This release includes the beginning of work towards a rewrite of much of the cram code. This code is not structured like the rest of htsjdk and needs large compatibility breaking changes compatible changes in order to be brought in line with the rest of the codebase. We will be treating the code in this htsjdk.samtools.cram package as unstable and will not be providing deprecation warnings in releases before altering it like we try to do with most of the codebase.

We believe that there are few downstream users who directly call into that code and most users only interact with cram files through the CramFileWriter and CramFileReader which will not be changing. Please get in contact if you make extensive use of classes in the htsjdk.samtools.cram package and are concerned about these changes.

The end result will be better cram code that should run faster, have fewer bugs, and include more features that are not yet implemented in htsjdk.

JAXB removed

In order to improve htsjdk compatibility with java 9+ we have removed all uses of the javax.xml.bind package. This breaks the ability to marshal SAMFileHeader to XML, but since this was the only class that could be converted in this way we believe that there will be no users effected by this change.

Complete change list

c484241 MergeSamFiles accept SO:UNKNOWN (#1069)
f00a754 Remove JAXB (#1206)
44baddf Adding support for 0-length B arrays in SAM files to conform to 1.6 spec (#1194)
334800e Add the PS FORMAT VCF standard header field (#1200)
bbc674f BetaIntegerCodecTest and bugfixes (#1199)
37069a3 Unit Tests and fixes for a few classes in samtools.cram.io (#1198)
d504256 fix CRAMIterator when next() is called without hasNext() (#1193)
1971f4e Adding Allele constants for simple SV types(#1192)
23f3223 Add 6 CRAM compliance tests from htslib (#1185)
ff3db93 Bug fix: BinaryCodec should not fail when both reading and no bytes are requested. (#1188)
a7214ca Add a reset() method to ProgressLoggerInterface. (#1184)
49c70e5 fixing bug in SeekableHttpStream.read (#1182)
a762262 cleanup bam order checking code (#770)