Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load BlockCompressedIndexedFastaSequenceFile and GZIIndex from streams #1259

Merged
merged 2 commits into from
Feb 26, 2019

Conversation

tomwhite
Copy link
Contributor

Description

This enables fasta.gz files to be loaded from streams (not just path objects), which is needed for disq-bio/disq#75.

See original bug report in broadinstitute/gatk#5547

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

…ted from streams.

This enables fasta.gz files to be loaded from streams.
@tomwhite
Copy link
Contributor Author

cc @lbergelson

@codecov-io
Copy link

codecov-io commented Jan 15, 2019

Codecov Report

Merging #1259 into master will decrease coverage by 1.86%.
The diff coverage is 51.22%.

@@              Coverage Diff               @@
##              master     #1259      +/-   ##
==============================================
- Coverage     69.356%   67.496%   -1.86%     
+ Complexity      8164      8153      -11     
==============================================
  Files            548       558      +10     
  Lines          32675     33374     +699     
  Branches        5520      5609      +89     
==============================================
- Hits           22662     22526     -136     
- Misses          7795      8658     +863     
+ Partials        2218      2190      -28
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/util/GZIIndex.java 62.069% <48.571%> (+0.462%) 22 <1> (+2) ⬆️
...rence/BlockCompressedIndexedFastaSequenceFile.java 70.27% <66.667%> (-0.697%) 8 <1> (+1)
.../main/java/htsjdk/samtools/sra/ReferenceCache.java 0% <0%> (-81.818%) 0% <0%> (-4%)
src/main/java/htsjdk/samtools/sra/SRAUtils.java 0% <0%> (-81.818%) 0% <0%> (-3%)
src/main/java/htsjdk/samtools/SRAIndex.java 0% <0%> (-77.273%) 0% <0%> (-18%)
src/main/java/htsjdk/samtools/SRAIterator.java 0% <0%> (-72.857%) 0% <0%> (-18%)
...java/htsjdk/samtools/sra/SRAAlignmentIterator.java 0% <0%> (-72%) 0% <0%> (-15%)
...c/main/java/htsjdk/samtools/sra/SRALazyRecord.java 0% <0%> (-71.591%) 0% <0%> (-130%)
src/main/java/htsjdk/samtools/BinList.java 0% <0%> (-70.588%) 0% <0%> (-2%)
...va/htsjdk/samtools/sra/SRAUnalignmentIterator.java 0% <0%> (-65.574%) 0% <0%> (-14%)
... and 108 more

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwhite This looks good. It needs some additional javadoc and then is good to merge.

@@ -75,6 +78,15 @@ public BlockCompressedIndexedFastaSequenceFile(final Path path, final FastaSeque
}
}

public BlockCompressedIndexedFastaSequenceFile(final String source, final SeekableStream in, final FastaSequenceIndex index, final SAMSequenceDictionary dictionary, final GZIIndex gziIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc, mention that the stream shouldn't be decompressed already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, it looks like there might be a bug in the previous constructor where it calls canCreateBZlockCompressedIndexedFastaSequence(). That checks that there's a gzi file in the appropriate location relative to Path even though it has the GZI index passed in... I opened a #1290 to track that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* @throws IOException if an I/O error occurs.
*/
public static final GZIIndex loadIndex(final String source, final InputStream indexIn) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that source is used for error messages in the doc and can be null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is silly, but can you put the InputStream as the first parameter since it's the important one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the javadoc.

However, I left source as the first parameter for consistency with BlockCompressedIndexedFastaSequenceFile, ReferenceSequenceFileFactory, VariantContext, tribble's IndexFactory, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point.

}
}

public static final GZIIndex loadIndex(final String source, final ReadableByteChannel channel) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel first and javadoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you @tomwhite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants