Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WGS data #104

Open
jjfarrell opened this issue Apr 7, 2024 · 4 comments
Open

WGS data #104

jjfarrell opened this issue Apr 7, 2024 · 4 comments

Comments

@jjfarrell
Copy link

The pipeline looks like it is optimized for processing imputed vcf data from UMICH or TOPMed imputation server which generates a DS field. Is is possible to run the pipeline on GATK WGS sequencing data without the DS field. Or does that need to be calculated with the PL field and written out to plink format before running the pipeline?

@aaleksandrov95
Copy link

aaleksandrov95 commented Apr 29, 2024

Are there any updates on this issue? I believe I am having a similar problem in our implementation of the pipeline for GATK WGS.

@seppinho
Copy link
Member

Hi,
I just double checked the Regenie repo, and regenie uses either DS or GT. So in case you want to use our pipeline, you have to convert it first (e.g. with plink2). If you have a working command, I'm happy to integrate that as a step into the pipeline. I think thats useful for many!

See here: rgcgithub/regenie#114 (comment)

@aaleksandrov95
Copy link

aaleksandrov95 commented Jun 6, 2024

I got it to work by converting the VCF to BED using plink2. I also saw in several other issues, such as rgcgithub/regenie#209, that a Oxford Sample file may help with the missing values error, which kept occurring for me, so I generated one as well, again using plink2.

The only tricky part was to keep the IID and FID consistent with the internal workings of the pipeline, but now it seems to run fine.

EDIT: Here are the PLINK2 commands for reference.

VCF-to-BED:

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --double-id \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527 \
        --make-bed \
        --out ${output_path}

Making Oxford .sample file

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527\
        --recode oxford \
        --out ${output_path}

As mentioned, I added the oxford .sample file, because of several missing values/ invalid sample names errors, as linked in the issue above.

@seppinho
Copy link
Member

seppinho commented Jun 6, 2024

Great to hear. Can you also share the commands, in case someone else is running into the same issue?
Best.
Sebastian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants