-
Notifications
You must be signed in to change notification settings - Fork 14
Attempt to make the GATK tasks a) auto-detect GATK version and b) swi… #365
Conversation
@clintval You might be interested in this given your recent set of tweets. This isn't as elegant as it could be, but is an attempt to handle GATK 3.x vs. 4.x with a single set of task classes. |
I support this PR ❤️ |
I spent some time thinking about how I would solve this problem in a typed workflow DSL like Dagr. I originally found inspiration in a little library that helps with data class migrations. I dabbled with this problem in Snakemake but I never arrived on a working solution. I'm certain there is an elegant solution out there that would empower the user! I'm thinking a first-class feature upon Backward-in-time SupportDagr maintains a compatibility interface with older versions of the executable (similar to the aim of this PR). Forward-in-time SupportClient written compatibility that provides a way to migrate an existing My free 2¢ idea Thanks for listening! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I really like this pattern. I think there's going to be a lot of if
/else if
/...` logic if we follow this pattern more generally for tasks. Let's discuss this offline.
val jar = gatkJar | ||
if (jar == GatkTask.GatkJarPath) GatkTask.GatkMajorVersion | ||
else GatkTask.majorVersion(jar) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about this being something to bake into JarTask
. We could also have a trait too, since not everything is a JAR.
protected def addWalkerArgs(buffer: ListBuffer[Any]): Unit | ||
|
||
/** Helper function to select an argument name for pre vs. post V4 naming. */ | ||
protected def either(preV4Name: String, postV4Name: String): String = if (gatkMajorVersion < 4) preV4Name else postV4Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems brittle if ever the GATK API changes. And also brittle if this becomes a pattern we adopt for other tools who change their command line API more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed ... I think I might just go back to having two separate argument generation sections for 3 vs 4 for now.
@@ -50,9 +50,13 @@ class GenotypeGvcfs private (ref: PathToFasta, | |||
val dbSnpVcf: Option[PathToVcf] = None) | |||
extends GatkTask("GenotypeGVCFs", ref, intervals=intervals) { | |||
|
|||
if (gatkMajorVersion >= 4 && gvcfs.length > 1) throw new IllegalStateException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require
?
f567c0a
to
1bd30ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor quibbles, fix/resolve and approved.
case n if n >= 4 => | ||
// Args for version 4 | ||
buffer.append("-I", bam.toAbsolutePath) | ||
buffer.append("-O", vcf.toAbsolutePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GenotypeGvcfs
you have a section before this block that adds common arguments:
// Args that are common to all versions
I'd recommend moving common args similarly, or not have common args in GenotypeGvcfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost did it, but in fact there's only 1 common arg, and it felt not worth it. Note that output is -o
vs.-O
. But I can move the -I
argument out.
case n if n < 4 => | ||
buffer.append("--out", vcf.toAbsolutePath.toString) | ||
case n if n >= 4 => | ||
buffer.append("--output", vcf.toAbsolutePath.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need .toString
here and elsewhere since we call it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why there were .toString
s there in the first place. Removed.
@@ -51,6 +51,8 @@ class Mutect1(val tumorBam: PathToBam, | |||
) extends GatkTask(walker="MuTect", ref=ref, intervals=Some(intervals)) { | |||
|
|||
override protected def addWalkerArgs(buffer: ListBuffer[Any]): Unit = { | |||
require(gatkMajorVersion < 4, s"Mutect v1 is not supported in GATK v${gatkMajorVersion}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This require
is in the addWalkerArgs
method, but in IndelRealigner
and RealignerTargetCreator
it's in the constructor. The former will check when we run the task, while the latter when we build the task. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I'll move this one up because failing earlier seems better.
Codecov Report
@@ Coverage Diff @@
## master #365 +/- ##
=======================================
Coverage 91.95% 91.95%
=======================================
Files 31 31
Lines 1156 1156
Branches 65 65
=======================================
Hits 1063 1063
Misses 93 93 Continue to review full report at Codecov.
|
…tch command lines accordingly.
@nh13 This isn't quite done yet - firstly I haven't tested any of this and secondly I've only gone through HC and GenotypeGVCF tasks so far to adapt for 4+. That said, I'd like feedback on the approach I've taken before I go much further.