You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing data currently contain sensitive data that cannot be released publicly. We should migrate over to a 2-batch test dataset built using the 1000 Genome Project (1KGP) high-coverage data that we are using for the cohort mode Terra workspace (not to be confused with the single-sample mode 1KGP reference panel which is a single batch composed of a different set of samples from 1KGP).
Ideally, all our testing should be self-contained, meaning that prerequisite cohort-dependent inputs for all modules (e.g. vcfs, metrics files, etc.) can be generated from the tests of earlier modules. Therefore, we will need separate tests for batch1 and batch2 starting at GenerateSampleMetricsBatch through FilterBatch and GenotypeBatch. Other downstream modules are run on the whole cohort (batch1 and batch2 together).
We will replace small/large test set designations. In the future, we can think about options to run on a subset of chromosomes to speed up testing. The one exception would be GenerateSampleMetrics - currently we test the batch version of this (GenerateSampleMetricsBatch). We should add another template for GenerateSampleMetrics itself to run on one sample, since this workflow is quite expensive.
A few technical notes:
New input values need to be defined for batch1 and batch2 in /input_values. For cohort-level steps (mentioned above), let's define a third inputs file for a 1kgp_test cohort (i.e. 1kgp_test.json).
Input data and configurations can be found in inputs/terra_workspaces/cohort_mode (after running scripts/inputs/build_default_inputs.sh). This includes CRAM and gVCF paths, batch membership assignments, and cohort-specific resource files (e.g. ped file).
Copy and organize workflow inputs/outputs in gs://gatk-sv-resources-public/test, including metrics generated by enabling run_module_metrics.
The text was updated successfully, but these errors were encountered:
As part of the migration to 1KGP testing, we should test with multiple batches for increased robustness & specifically to test MergeCohortVcfs.wdl. This will also be beneficial for testing a future batch-combine workflow needed for Terra.
Testing data currently contain sensitive data that cannot be released publicly. We should migrate over to a 2-batch test dataset built using the 1000 Genome Project (1KGP) high-coverage data that we are using for the cohort mode Terra workspace (not to be confused with the single-sample mode 1KGP reference panel which is a single batch composed of a different set of samples from 1KGP).
Ideally, all our testing should be self-contained, meaning that prerequisite cohort-dependent inputs for all modules (e.g. vcfs, metrics files, etc.) can be generated from the tests of earlier modules. Therefore, we will need separate tests for
batch1
andbatch2
starting atGenerateSampleMetricsBatch
throughFilterBatch
andGenotypeBatch
. Other downstream modules are run on the whole cohort (batch1
andbatch2
together).We will replace
small
/large
test set designations. In the future, we can think about options to run on a subset of chromosomes to speed up testing. The one exception would beGenerateSampleMetrics
- currently we test the batch version of this (GenerateSampleMetricsBatch
). We should add another template forGenerateSampleMetrics
itself to run on one sample, since this workflow is quite expensive.A few technical notes:
batch1
andbatch2
in/input_values
. For cohort-level steps (mentioned above), let's define a third inputs file for a1kgp_test
cohort (i.e.1kgp_test.json
).inputs/terra_workspaces/cohort_mode
(after runningscripts/inputs/build_default_inputs.sh
). This includes CRAM and gVCF paths, batch membership assignments, and cohort-specific resource files (e.g. ped file).gs://gatk-sv-resources-public/test
, including metrics generated by enablingrun_module_metrics
.The text was updated successfully, but these errors were encountered: