Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using phased-vcf file as input for MSMC2 #52

Open
Niloofar-Alaei opened this issue Jun 15, 2023 · 2 comments
Open

using phased-vcf file as input for MSMC2 #52

Niloofar-Alaei opened this issue Jun 15, 2023 · 2 comments

Comments

@Niloofar-Alaei
Copy link

Hi
I want to run the MSMC2 for my dataset which is phased vcf files (multi-sample vcf file with 26 samples) for each chromosome separately (i.e. Chr10.vcf.gz).

I did this process as follows, to use my vcf files as input for running the MSMC2:

First, use the bcftools to produce a separate vcf file for each sample (i.e. sample1.Chr10.vcf.gz).
Second, use the vcfAllSiteParser.py to produce the .bed files.
and then running generate_multihetsep.py to merge VCF and mask files together. *I didn’t do the phasing step, because I supposed that it should include my phasing dataset.

But I received an error in the last step when I ran msmc2 for Estimating the effective population size. I noticed that produced multihetsep.txt files (i.e. Chr10. multihetsep.txt) are too heavy also.

My question is, should I run the phasing step too?

I really appreciate your help in helping me identify the problem.

With the best
Niloo

@stschiff
Copy link
Owner

As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.

@Niloofar-Alaei
Copy link
Author

Niloofar-Alaei commented Jun 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants