-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for partial VCF writing #904
Changes from 4 commits
d1fcf24
e90f916
f25414e
b84c91b
30ae367
11b795d
635374b
3e87db8
f459d97
ca56b10
7bc130f
c23c897
46d1a0f
5f2bdf2
0e994d6
723ee6e
00adf5d
8629d2e
e6a16be
c75f6ab
cfc73f9
5f03287
8bb5bc1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -53,4 +53,10 @@ public void writeHeader(final VCFHeader header) { | |
public boolean checkError() { | ||
return false; | ||
} | ||
|
||
@Override | ||
public void setVcfHeader(VCFHeader header) { | ||
this.underlyingWriter.setVcfHeader(header); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. untested according to code cov There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the test case |
||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -221,6 +221,11 @@ public void close() { | |
super.close(); | ||
} | ||
|
||
@Override | ||
public void setVcfHeader(VCFHeader header) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the param should be marked final There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
//no-op | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this a "no-op"? This doesn't seem right. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Implemented. |
||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We really need to ensure that this PR doesn't introduce any subtle changes that are not essential to the core change. There are still several places here where the stale header that was passed in is being used, which makes it different from the previous implementation (which updated the header and used the updated version after that). Its not clear to me if the differences are significant or not; but thats just all the more reason to make sure we don't change it unless its necessary. |
||
// -------------------------------------------------------------------------------- | ||
// | ||
// implicit block | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,6 +132,11 @@ public synchronized void add(VariantContext vc) { | |
emitSafeRecords(); | ||
} | ||
|
||
@Override | ||
public void setVcfHeader(VCFHeader header) { | ||
innerWriter.setVcfHeader(header); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also untested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but don't worry about this one since SortingVariantContextWriterBase is basically ununsed. |
||
} | ||
|
||
/** | ||
* Gets a string representation of this object. | ||
* @return a string representation of this object | ||
|
@@ -199,4 +204,4 @@ public VCFRecord(VariantContext vc) { | |
this.vc = vc; | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -228,4 +228,10 @@ public void add(final VariantContext context) { | |
throw new RuntimeIOException("Unable to write the VCF object to " + getStreamName(), e); | ||
} | ||
} | ||
|
||
@Override | ||
public void setVcfHeader(VCFHeader header) { | ||
this.mHeader = doNotWriteGenotypes ? new VCFHeader(header.getMetaDataInSortedOrder()) : header; | ||
this.vcfEncoder = new VCFEncoder(this.mHeader, this.allowMissingFieldsInHeader, this.writeFullFormatField); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this duplicates code in the existing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think we should guard against the problematic case where both setHeader and writeHeader are called, in any order, using different headers. The simplest way would be to just reject either call if a header has already been established. That would ensure consistency. @amilamanoj would that work for your use case ? If so, we should include that contract in the javadoc here at the interface level, and include tests that verify that for each implementation. If not we may need to resort to something more complicated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @droazen I've changed it so that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @amilamanoj I originally thought the right thing would be for both methods to reject if the header is already set, but changing writeHeader to work that way could break existing code. So perhaps we should just make setVcfHeader enforce that, and leave writeHeader's current promiscuous behavior as is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @amilamanoj Actually, on giving this more thought, any code that currently calls writeHeader more than once is already on a bad path, so I would vote for rejecting in either case if a header exists. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cmnbroad Thanks for the feedback. I've also discussed this with @cyenyxe. We also think changing header should be prevented if the header (or just a part of body, in our use case) is already written to the output stream, since there's only one output stream per writer instance. But setting the header many times should not matter as long as nothing is written to the output stream yet. I've added some checks along with tests, please take a look and see if it works. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,4 +49,6 @@ public interface VariantContextWriter extends Closeable { | |
public boolean checkError(); | ||
|
||
public void add(VariantContext vc); | ||
|
||
void setVcfHeader(VCFHeader header); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add javadoc for the new interface method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,10 +26,13 @@ | |
package htsjdk.variant.variantcontext.writer; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need tests to prove that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should also test interaction between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
import htsjdk.samtools.SAMSequenceDictionary; | ||
import htsjdk.samtools.util.BlockCompressedInputStream; | ||
import htsjdk.samtools.util.TestUtil; | ||
import htsjdk.tribble.AbstractFeatureReader; | ||
import htsjdk.tribble.FeatureReader; | ||
import htsjdk.tribble.Tribble; | ||
import htsjdk.tribble.readers.AsciiLineReader; | ||
import htsjdk.tribble.readers.AsciiLineReaderIterator; | ||
import htsjdk.tribble.util.TabixUtils; | ||
import htsjdk.variant.VariantBaseTest; | ||
import htsjdk.variant.variantcontext.Allele; | ||
|
@@ -45,6 +48,7 @@ | |
import htsjdk.variant.vcf.VCFHeaderVersion; | ||
|
||
import java.io.File; | ||
import java.io.FileInputStream; | ||
import java.io.FileNotFoundException; | ||
import java.io.IOException; | ||
import java.util.ArrayList; | ||
|
@@ -133,6 +137,49 @@ public void testBasicWriteAndRead(final String extension) throws IOException { | |
|
||
} | ||
|
||
/** test, using the writer and reader, that we can output and input a VCF body without problems */ | ||
@Test(dataProvider = "vcfExtensionsDataProvider") | ||
public void testWriteAndReadVCFBody(final String extension) throws IOException { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. testWriteAndReadVCFHeaderless |
||
final File fakeVCFFile = File.createTempFile("testWriteAndReadVCFBody.", extension); | ||
fakeVCFFile.deleteOnExit(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use method There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
if (".vcf.gz".equals(extension)) { | ||
new File(fakeVCFFile.getAbsolutePath() + ".tbi").deleteOnExit(); | ||
} else { | ||
Tribble.indexFile(fakeVCFFile).deleteOnExit(); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be simpler to just create a temp directory (with deleteOnExit), and then just create the file in the temp dir. Then a lot of this code would be unnecessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My fault again - these do need deleteOnExit, or better yet unset INDEX_ON_THE_FLY except in one test specifically for the index case. |
||
metaData = new HashSet<VCFHeaderLine>(); | ||
additionalColumns = new HashSet<String>(); | ||
final SAMSequenceDictionary sequenceDict = createArtificialSequenceDictionary(); | ||
final VCFHeader header = createFakeHeader(metaData, additionalColumns, sequenceDict); | ||
final VariantContextWriter writer = new VariantContextWriterBuilder() | ||
.setOutputFile(fakeVCFFile) | ||
.setReferenceDictionary(sequenceDict) | ||
.setOptions(EnumSet.of(Options.ALLOW_MISSING_FIELDS_IN_HEADER, Options.INDEX_ON_THE_FLY)) | ||
.build(); | ||
writer.setVcfHeader(header); | ||
writer.add(createVC(header)); | ||
writer.add(createVC(header)); | ||
writer.close(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
final VCFCodec codec = new VCFCodec(); | ||
codec.setVCFHeader(header, VCFHeaderVersion.VCF4_0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why 4.0? The writer above will write a 4.2 vcf? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Used 4.2. |
||
AsciiLineReaderIterator iterator; | ||
if (".vcf.gz".equals(extension)) { | ||
iterator = new AsciiLineReaderIterator(new AsciiLineReader(new BlockCompressedInputStream(fakeVCFFile))); | ||
} else { | ||
iterator = new AsciiLineReaderIterator(new AsciiLineReader(new FileInputStream(fakeVCFFile))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All input stream creations should be wrapped in a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, is it possible to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
} | ||
int counter = 0; | ||
while(iterator.hasNext()) { | ||
VariantContext context = codec.decode(iterator.next()); | ||
if (context != null) { | ||
counter++; | ||
} | ||
} | ||
Assert.assertEquals(counter, 2); | ||
|
||
} | ||
|
||
/** | ||
* create a fake header of known quantity | ||
* @param metaData the header lines | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code isn't consistent about this, but we generally mark arguments as final in new code. Thanks.