using phased-vcf file as input for MSMC2 #52

Niloofar-Alaei · 2023-06-15T14:47:24Z

Hi
I want to run the MSMC2 for my dataset which is phased vcf files (multi-sample vcf file with 26 samples) for each chromosome separately (i.e. Chr10.vcf.gz).

I did this process as follows, to use my vcf files as input for running the MSMC2:

First, use the bcftools to produce a separate vcf file for each sample (i.e. sample1.Chr10.vcf.gz).
Second, use the vcfAllSiteParser.py to produce the .bed files.
and then running generate_multihetsep.py to merge VCF and mask files together. *I didn’t do the phasing step, because I supposed that it should include my phasing dataset.

But I received an error in the last step when I ran msmc2 for Estimating the effective population size. I noticed that produced multihetsep.txt files (i.e. Chr10. multihetsep.txt) are too heavy also.

My question is, should I run the phasing step too?

I really appreciate your help in helping me identify the problem.

With the best
Niloo

stschiff · 2023-06-22T11:36:14Z

As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.

Niloofar-Alaei · 2023-06-22T11:40:06Z

yes, we discussed and I also checked my phasing vcf files. the problem is from their format and I am working to solve it. Many thanks for your help Niloo

…

________________________________ From: Stephan Schiffels ***@***.***> Sent: 22 June 2023 13:36:26 To: stschiff/msmc2 Cc: Niloofar Alaei Kakhki; Author Subject: Re: [stschiff/msmc2] using phased-vcf file as input for MSMC2 (Issue #52) As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size. — Reply to this email directly, view it on GitHub<#52 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANPOBGGLIV4G544GFT3XIOTXMQU3VANCNFSM6AAAAAAZH7BCBU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

stschiff mentioned this issue Jun 22, 2023

using the phased-vcf files as input for MSMC2 stschiff/msmc#56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using phased-vcf file as input for MSMC2 #52

using phased-vcf file as input for MSMC2 #52

Niloofar-Alaei commented Jun 15, 2023

stschiff commented Jun 22, 2023

Niloofar-Alaei commented Jun 22, 2023 via email

using phased-vcf file as input for MSMC2 #52

using phased-vcf file as input for MSMC2 #52

Comments

Niloofar-Alaei commented Jun 15, 2023

stschiff commented Jun 22, 2023

Niloofar-Alaei commented Jun 22, 2023 via email