Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for partial VCF writing #904

Merged
merged 23 commits into from
Sep 12, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d1fcf24
Added new method setVcfHeader
amilamanoj Jun 15, 2017
e90f916
Updated setVcfHeader() method to replicate the header initialization …
amilamanoj Jun 19, 2017
f25414e
Added unit test for VcfWriter modification
Jun 20, 2017
b84c91b
Updated test case to test compressed VCF
Jun 20, 2017
30ae367
Added javadoc for the new interface method.
amilamanoj Jun 28, 2017
11b795d
Test WriteAndReadVCFBody improvements:
amilamanoj Jun 28, 2017
635374b
Added AsyncVariantContextWriterUnitTest.
amilamanoj Jun 28, 2017
3e87db8
Implemented setVcfHeader method in BCF2Writer. Added a test case.
amilamanoj Jun 28, 2017
f459d97
Improved setVcfHeader method of BCF2Writer. Added a new test for writ…
amilamanoj Jun 29, 2017
ca56b10
Fixed test case.
amilamanoj Jun 29, 2017
7bc130f
Refactored tests based on the feedback
amilamanoj Jul 4, 2017
c23c897
Modified VCFWriter and BCF2Writer so that writeHeader() calls into se…
amilamanoj Jul 4, 2017
46d1a0f
Simplified temp file creation.
Jul 4, 2017
5f2bdf2
Added checks to prevent writing with different headers.
amilamanoj Jul 9, 2017
0e994d6
Added same checks to prevent writing with different headers for BCF2W…
amilamanoj Jul 11, 2017
723ee6e
Used single state variable, made method arguments final, improved tes…
amilamanoj Jul 11, 2017
00adf5d
Fixed for using correct header variable, Improved tests.
amilamanoj Jul 13, 2017
8629d2e
Changed to avoid making a copy of the header twice.
amilamanoj Jul 17, 2017
e6a16be
Used updated header, Moved setVCFHeader out of the try block.
amilamanoj Jul 29, 2017
c75f6ab
Refactored test cases.
amilamanoj Jul 29, 2017
cfc73f9
Removed unused variants.
amilamanoj Jul 29, 2017
5f03287
Minor test improvements.
amilamanoj Jul 29, 2017
8bb5bc1
Updated Javadoc. Minor refactoring.
amilamanoj Sep 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,10 @@ public void writeHeader(final VCFHeader header) {
public boolean checkError() {
return false;
}

@Override
public void setVcfHeader(VCFHeader header) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code isn't consistent about this, but we generally mark arguments as final in new code. Thanks.

this.underlyingWriter.setVcfHeader(header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untested according to code cov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test case

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,11 @@ public void close() {
super.close();
}

@Override
public void setVcfHeader(VCFHeader header) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the param should be marked final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

//no-op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a "no-op"? This doesn't seem right. The add() method in BCF2Writer will blow up without a header, since it calls vc = vc.fullyDecode(header, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented.

}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to ensure that this PR doesn't introduce any subtle changes that are not essential to the core change. There are still several places here where the stale header that was passed in is being used, which makes it different from the previous implementation (which updated the header and used the updated version after that). Its not clear to me if the differences are significant or not; but thats just all the more reason to make sure we don't change it unless its necessary.

// --------------------------------------------------------------------------------
//
// implicit block
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,11 @@ public synchronized void add(VariantContext vc) {
emitSafeRecords();
}

@Override
public void setVcfHeader(VCFHeader header) {
innerWriter.setVcfHeader(header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also untested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't worry about this one since SortingVariantContextWriterBase is basically ununsed.

}

/**
* Gets a string representation of this object.
* @return a string representation of this object
Expand Down Expand Up @@ -199,4 +204,4 @@ public VCFRecord(VariantContext vc) {
this.vc = vc;
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -228,4 +228,10 @@ public void add(final VariantContext context) {
throw new RuntimeIOException("Unable to write the VCF object to " + getStreamName(), e);
}
}

@Override
public void setVcfHeader(VCFHeader header) {
this.mHeader = doNotWriteGenotypes ? new VCFHeader(header.getMetaDataInSortedOrder()) : header;
this.vcfEncoder = new VCFEncoder(this.mHeader, this.allowMissingFieldsInHeader, this.writeFullFormatField);
Copy link
Contributor

@droazen droazen Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this duplicates code in the existing writeHeader() method, would it be possible to have writeHeader() call into setVcfHeader(), or call into a shared helper method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we should guard against the problematic case where both setHeader and writeHeader are called, in any order, using different headers. The simplest way would be to just reject either call if a header has already been established. That would ensure consistency. @amilamanoj would that work for your use case ? If so, we should include that contract in the javadoc here at the interface level, and include tests that verify that for each implementation. If not we may need to resort to something more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droazen I've changed it so that writeHeader() calls into setVcfHeader().
@cmnbroad Yes that should work for our case, since we don't need to change the header once it's set. When rejecting, do we still allow repeated calls of writeHeader() and setVcfHeader() (individually, without calling them both) or just simply check if header has been set once before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amilamanoj I originally thought the right thing would be for both methods to reject if the header is already set, but changing writeHeader to work that way could break existing code. So perhaps we should just make setVcfHeader enforce that, and leave writeHeader's current promiscuous behavior as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amilamanoj Actually, on giving this more thought, any code that currently calls writeHeader more than once is already on a bad path, so I would vote for rejecting in either case if a header exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Thanks for the feedback. I've also discussed this with @cyenyxe. We also think changing header should be prevented if the header (or just a part of body, in our use case) is already written to the output stream, since there's only one output stream per writer instance. But setting the header many times should not matter as long as nothing is written to the output stream yet. I've added some checks along with tests, please take a look and see if it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad @droazen @yfarjoun Added the same checks for BCF2Writer as well. I think with this commit I've addressed all the comments so far. Waiting for further feedback.

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,6 @@ public interface VariantContextWriter extends Closeable {
public boolean checkError();

public void add(VariantContext vc);

void setVcfHeader(VCFHeader header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for the new interface method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,13 @@
package htsjdk.variant.variantcontext.writer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need tests to prove that the setVcfHeader() works for all concrete implementations of VariantContextWriter

Copy link
Contributor

@droazen droazen Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also test interaction between setVcfHeader() and writeHeader() (ie., test calling one and then the other for the various implementations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


import htsjdk.samtools.SAMSequenceDictionary;
import htsjdk.samtools.util.BlockCompressedInputStream;
import htsjdk.samtools.util.TestUtil;
import htsjdk.tribble.AbstractFeatureReader;
import htsjdk.tribble.FeatureReader;
import htsjdk.tribble.Tribble;
import htsjdk.tribble.readers.AsciiLineReader;
import htsjdk.tribble.readers.AsciiLineReaderIterator;
import htsjdk.tribble.util.TabixUtils;
import htsjdk.variant.VariantBaseTest;
import htsjdk.variant.variantcontext.Allele;
Expand All @@ -45,6 +48,7 @@
import htsjdk.variant.vcf.VCFHeaderVersion;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
Expand Down Expand Up @@ -133,6 +137,49 @@ public void testBasicWriteAndRead(final String extension) throws IOException {

}

/** test, using the writer and reader, that we can output and input a VCF body without problems */
@Test(dataProvider = "vcfExtensionsDataProvider")
public void testWriteAndReadVCFBody(final String extension) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testWriteAndReadVCFHeaderless

final File fakeVCFFile = File.createTempFile("testWriteAndReadVCFBody.", extension);
fakeVCFFile.deleteOnExit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use method VariantBaseTest.createTempFile() here, which does this for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (".vcf.gz".equals(extension)) {
new File(fakeVCFFile.getAbsolutePath() + ".tbi").deleteOnExit();
} else {
Tribble.indexFile(fakeVCFFile).deleteOnExit();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler to just create a temp directory (with deleteOnExit), and then just create the file in the temp dir. Then a lot of this code would be unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault again - these do need deleteOnExit, or better yet unset INDEX_ON_THE_FLY except in one test specifically for the index case.

metaData = new HashSet<VCFHeaderLine>();
additionalColumns = new HashSet<String>();
final SAMSequenceDictionary sequenceDict = createArtificialSequenceDictionary();
final VCFHeader header = createFakeHeader(metaData, additionalColumns, sequenceDict);
final VariantContextWriter writer = new VariantContextWriterBuilder()
.setOutputFile(fakeVCFFile)
.setReferenceDictionary(sequenceDict)
.setOptions(EnumSet.of(Options.ALLOW_MISSING_FIELDS_IN_HEADER, Options.INDEX_ON_THE_FLY))
.build();
writer.setVcfHeader(header);
writer.add(createVC(header));
writer.add(createVC(header));
writer.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a try-with-resources block rather than an explicit close() call, to ensure that the writer gets closed even if there's an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


final VCFCodec codec = new VCFCodec();
codec.setVCFHeader(header, VCFHeaderVersion.VCF4_0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4.0? The writer above will write a 4.2 vcf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used 4.2.

AsciiLineReaderIterator iterator;
if (".vcf.gz".equals(extension)) {
iterator = new AsciiLineReaderIterator(new AsciiLineReader(new BlockCompressedInputStream(fakeVCFFile)));
} else {
iterator = new AsciiLineReaderIterator(new AsciiLineReader(new FileInputStream(fakeVCFFile)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All input stream creations should be wrapped in a try-with-resources to ensure the streams get closed.

Copy link
Contributor

@droazen droazen Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it possible to use AbstractFeatureReader.getFeatureReader() here to get a reader on the vcf file? It will handle the block-gzipped case for you automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractFeatureReader.getFeatureReader() cannot be used here because it includes a check for header is the vcf file. Since we are reading a data-block, calling this method throws htsjdk.tribble.TribbleException$MalformedFeatureFile.

}
int counter = 0;
while(iterator.hasNext()) {
VariantContext context = codec.decode(iterator.next());
if (context != null) {
counter++;
}
}
Assert.assertEquals(counter, 2);

}

/**
* create a fake header of known quantity
* @param metaData the header lines
Expand Down