Update GermlineCNVCaller to gatk 4.1.8.1 #58

TedBrookings · 2020-09-23T18:20:45Z

Updated gcnv to latest version for cohort and case mode
Address small risk of NaN errors by try-catch in task (bash) and
reproducibly scrambling sample order to get different random samples
Decrease wall-clock time of MakeBincovMatrix and call only once
Fix crashes and improve output of RawVcfQc
Update VM scaling to prevent crashes with larger cohorts

mwalker174

Thanks @TedBrookings this is looking really good. I have some questions/comments below.

wdl/MakeBincovMatrix.wdl

mwalker174 · 2020-09-24T13:44:47Z

wdl/Module00c.wdl

@@ -140,11 +142,13 @@ workflow Module00c {

    # Runtime parameters
    String sv_base_mini_docker
+    String sv_base_docker


This input is breaking the single-sample wdl. Can you run scripts/test/validate.sh? It'll tell you which dependent WDLs you need to update. I think you should also run the single-sample pipeline json test/single-sample/GATKSVPipelineSingleSampleTest.test_na19240.json to make sure everything works okay.

Just remembered to remind you to update the gCNV model for the single-sample run

I ran the single-sample pipeline. There were some problems with the bincov matrix which I corrected. It ran through until Module05_06 where it crashed in CleanVcf1a due to my not having rebuilt the sv-pipeline docker after Chris and I made incompatible changes to it (the WDL is looking for the output "includelist.txt" but my old docker is providing whitelist.txt). I'm rebuilding it now because it sounds like there's immediate interest in running this.

mwalker174 · 2020-09-24T13:45:33Z

wdl/BAFTestChromosome.wdl

@@ -90,10 +91,14 @@ task BAFTest {
    }
  }

+  Int disk_gb = disk_gb_baseline + ceil(


How many / what size batches did you use for the scaling regressions?

I didn't do any scaling regressions. I had (just a few) shards crash from out-of-disk errors, so I just added up the size of the input files. I think a few shards must wind up taking down a significant fraction of the baf_metrics file, because when I forgot to double it (for the 2x tabix) the shards still crashed. Incidentally, this is why I was removing those second tabix files in so many tasks. You didn't ask me to undo that for this task: should I assume it needs to be removed here too?

Sorry, I spoke too soon. PETest and SRTest were having rare out-of-disk crashes, and I did have crashes in BAFTest as well. So I just implemented the same disk scaling in every test (PE, SR, RD, BAF). I just downloaded my monitoring logs and at least in my panel, BAFTest never used much of its disk. Given that it's 27 GiB on my 500 sample panel, I think it's reasonable leaving this in for safety sake. That said, it could probably could be removed for BAFTest (and RDTest as well).
It turns out BAFTest was crashing out of memory on one shard. Since I didn't have much to go on for scaling the memory usage of BAFTest, I just passed a runtime_attr override to my input json.

We can revisit this once #51 goes in

Also yes delete the rm statement below

mwalker174 · 2020-09-24T13:46:13Z

wdl/CleanVcf.wdl

@@ -98,7 +98,7 @@ workflow CleanVcf {
    input:
      whole_file=CleanVcf1a.include_list[0],
      lines_per_shard=samples_per_step2_shard,
-      shard_prefix="whiteblack.",
+      shard_prefix="includeexclude.",


Thanks for catching these

mwalker174 · 2020-09-24T13:58:04Z

wdl/GermlineCNVCase.wdl

+            | awk '{print "--input "$0}' \
+            > read_count_files.args
+
+        function get_seeded_random() {


Can you open a quick ticket to remind us this needs to be replaced with the changes in Andrey's branch?

I added Issue #59 . I tried to assign Andrey but for whatever reason Github wouldn't select him. Maybe he's not set up to edit our repo? For now I assigned myself.

Just added Andrey to the dev list

wdl/PETestChromosome.wdl

wdl/SRTestChromosome.wdl

TedBrookings · 2020-09-24T14:54:18Z

I'm about to get started on the single-sample branch. I can't believe I forgot to update that! Thanks also for pointing out the validation script which I was unaware of.

Quick question: should I remove my added line deleting of the second tabix file on RDTestChromosome.wdl::RDTest and BAFTestChromosome.wdl::BAFTest?

mwalker174 · 2020-09-25T14:49:57Z

Quick question: should I remove my added line deleting of the second tabix file on RDTestChromosome.wdl::RDTest and BAFTestChromosome.wdl::BAFTest?

Yes, I missed those

mwalker174

Everything is looking good aside from the whitelist naming issue. I think it's safe to merge when the batch and single-sample tests complete.

* Updated gcnv to latest version for cohort and case mode * Addressed small risk of NaN errors by try-catch in task (bash) and reproducibly scrambling sample order to get different random samples * Decreased wall-clock time of MakeBincovMatrix and call only once * Fixed crashes and improved output of RawVcfQc * Updated VM scaling to prevent crashes with larger cohorts * Updated dockers and input jsons to work with new gcnv inputs and script changes * Added allowNestedInputs=true to single sample pipeline

TedBrookings requested a review from mwalker174 September 23, 2020 18:20

TedBrookings linked an issue Sep 23, 2020 that may be closed by this pull request

Upgrade to GATK v1.4.8.0 #8

Closed

mwalker174 reviewed Sep 24, 2020

View reviewed changes

TedBrookings force-pushed the tb_update_gcnv_version branch 2 times, most recently from 7c1ce0e to 77bfceb Compare October 5, 2020 13:29

mwalker174 approved these changes Oct 6, 2020

View reviewed changes

TedBrookings force-pushed the tb_update_gcnv_version branch from 9a769a6 to 9e723bd Compare October 8, 2020 13:07

TedBrookings force-pushed the tb_update_gcnv_version branch from 9e723bd to 2f522d7 Compare October 8, 2020 13:13

TedBrookings merged commit 5eb567e into master Oct 13, 2020

TedBrookings deleted the tb_update_gcnv_version branch October 13, 2020 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GermlineCNVCaller to gatk 4.1.8.1 #58

Update GermlineCNVCaller to gatk 4.1.8.1 #58

TedBrookings commented Sep 23, 2020

mwalker174 left a comment

mwalker174 Sep 24, 2020

mwalker174 Sep 24, 2020

TedBrookings Oct 5, 2020

mwalker174 Sep 24, 2020

TedBrookings Sep 24, 2020

TedBrookings Sep 24, 2020

mwalker174 Oct 6, 2020

mwalker174 Oct 6, 2020

mwalker174 Sep 24, 2020

mwalker174 Sep 24, 2020

TedBrookings Sep 24, 2020

mwalker174 Oct 6, 2020

TedBrookings commented Sep 24, 2020

mwalker174 commented Sep 25, 2020

mwalker174 left a comment

Update GermlineCNVCaller to gatk 4.1.8.1 #58

Update GermlineCNVCaller to gatk 4.1.8.1 #58

Conversation

TedBrookings commented Sep 23, 2020

mwalker174 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TedBrookings commented Sep 24, 2020

mwalker174 commented Sep 25, 2020

mwalker174 left a comment

Choose a reason for hiding this comment