Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCFHeaderReader BCF encoded file exception thrown for unrelated VCF header errors #132

Open
heuermh opened this issue Jun 15, 2017 · 2 comments

Comments

@heuermh
Copy link
Contributor

heuermh commented Jun 15, 2017

VCFHeaderReader uses a try catch to fall back to BCF encoding, which leads to incorrect error messages and stack trace if the header is actually VCF format but has unrelated errors.

E.g. Here the first exception should have been thrown (Your input file has a malformed header: Count < 0 for fixed size VCF header field BAD_PS), not logged as a warning, and the second exception should not have happened (Input stream does not contain a BCF encoded file; BCF magic header info not found, at record 0 with position 0).

scala> val variants = sc.loadVariants("truth_small_variants.variants.adam")
warning: while trying to read VCF header from file received exception: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: Count < 0 for fixed size VCF header field BAD_PS
htsjdk.tribble.TribbleException: Input stream does not contain a BCF encoded file; BCF magic header info not found, at record 0 with position 0:
  at htsjdk.variant.bcf2.BCF2Codec.error(BCF2Codec.java:478)
  at htsjdk.variant.bcf2.BCF2Codec.readHeader(BCF2Codec.java:149)
  at org.seqdoop.hadoop_bam.util.VCFHeaderReader.readHeaderFrom(VCFHeaderReader.java:67)
  at org.bdgenomics.adam.rdd.ADAMContext.org$bdgenomics$adam$rdd$ADAMContext$$readVcfHeader(ADAMContext.scala:228)
  at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadHeaderLines$1.apply(ADAMContext.scala:234)
  at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadHeaderLines$1.apply(ADAMContext.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at org.bdgenomics.adam.rdd.ADAMContext.loadHeaderLines(ADAMContext.scala:234)
  at org.bdgenomics.adam.rdd.ADAMContext.loadParquetVariants(ADAMContext.scala:1175)
  at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:1733)
  at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:1728)
  at scala.Option.fold(Option.scala:158)
  at org.apache.spark.rdd.Timer.time(Timer.scala:48)
  at org.bdgenomics.adam.rdd.ADAMContext.loadVariants(ADAMContext.scala:1726)
  ... 50 elided
@heuermh
Copy link
Contributor Author

heuermh commented Aug 30, 2017

@fnothaft Ping for feedback on this issue.

@cmnbroad
Copy link
Collaborator

Once we get samtools/htsjdk#837 merged in to htsjdk, we should be able to use it to fix this issue and eliminate the try/catch fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants