Handling - Large scale Individual WGS VCF's #107

snehaleela · 2024-07-09T07:09:36Z

Hi all, Can you please help with my understanding here - I am having large scale INDIVIDUAL WGS VCF files - want to run the NF-GWAS pipeline on the full dataset. I have the nextflow and infra ready to handle the size of this scale. ~50 Nodes - 64 CPUS 256 GB RAM

Does the pipeline assume that the input has to be merged per chromosome for each VCF?
Also, what all preprocessing steps are recommended before giving the input to the pipeline?
For this scale do we need to use .bgen files only ? Was this scale of data tested on the VCF data for reginie to perform in the best way?
If needed to create the merged VCF - can you confirm if this is the best method :
(Each VCFs > Normalize(bcftools) > for each VCF - Pvar,Pgen,Psam > Merge to 1 - Pvar,Pgen,Psam(Plink) > Convert to bgen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling - Large scale Individual WGS VCF's #107

Handling - Large scale Individual WGS VCF's #107

snehaleela commented Jul 9, 2024

Handling - Large scale Individual WGS VCF's #107

Handling - Large scale Individual WGS VCF's #107

Comments

snehaleela commented Jul 9, 2024