-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load BlockCompressedIndexedFastaSequenceFile and GZIIndex from streams #1259
Conversation
…ted from streams. This enables fasta.gz files to be loaded from streams.
cc @lbergelson |
Codecov Report
@@ Coverage Diff @@
## master #1259 +/- ##
==============================================
- Coverage 69.356% 67.496% -1.86%
+ Complexity 8164 8153 -11
==============================================
Files 548 558 +10
Lines 32675 33374 +699
Branches 5520 5609 +89
==============================================
- Hits 22662 22526 -136
- Misses 7795 8658 +863
+ Partials 2218 2190 -28
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwhite This looks good. It needs some additional javadoc and then is good to merge.
@@ -75,6 +78,15 @@ public BlockCompressedIndexedFastaSequenceFile(final Path path, final FastaSeque | |||
} | |||
} | |||
|
|||
public BlockCompressedIndexedFastaSequenceFile(final String source, final SeekableStream in, final FastaSequenceIndex index, final SAMSequenceDictionary dictionary, final GZIIndex gziIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc, mention that the stream shouldn't be decompressed already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, it looks like there might be a bug in the previous constructor where it calls canCreateBZlockCompressedIndexedFastaSequence()
. That checks that there's a gzi file in the appropriate location relative to Path even though it has the GZI index passed in... I opened a #1290 to track that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* | ||
* @throws IOException if an I/O error occurs. | ||
*/ | ||
public static final GZIIndex loadIndex(final String source, final InputStream indexIn) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention that source is used for error messages in the doc and can be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is silly, but can you put the InputStream as the first parameter since it's the important one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the javadoc.
However, I left source as the first parameter for consistency with BlockCompressedIndexedFastaSequenceFile
, ReferenceSequenceFileFactory
, VariantContext
, tribble's IndexFactory
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point.
} | ||
} | ||
|
||
public static final GZIIndex loadIndex(final String source, final ReadableByteChannel channel) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Channel first and javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you @tomwhite
Description
This enables fasta.gz files to be loaded from streams (not just path objects), which is needed for disq-bio/disq#75.
See original bug report in broadinstitute/gatk#5547
Checklist