You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
I want to run the MSMC2 for my dataset which is phased vcf files (multi-sample vcf file with 26 samples) for each chromosome separately (i.e. Chr10.vcf.gz).
I did this process as follows, to use my vcf files as input for running the MSMC2:
First, use the bcftools to produce a separate vcf file for each sample (i.e. sample1.Chr10.vcf.gz).
Second, use the vcfAllSiteParser.py to produce the .bed files.
and then running generate_multihetsep.py to merge VCF and mask files together. *I didn’t do the phasing step, because I supposed that it should include my phasing dataset.
But I received an error in the last step when I ran msmc2 for Estimating the effective population size. I noticed that produced multihetsep.txt files (i.e. Chr10. multihetsep.txt) are too heavy also.
My question is, should I run the phasing step too?
I really appreciate your help in helping me identify the problem.
With the best
Niloo
The text was updated successfully, but these errors were encountered:
As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.
yes, we discussed and I also checked my phasing vcf files. the problem is from their format and I am working to solve it.
Many thanks for your help
Niloo
________________________________
From: Stephan Schiffels ***@***.***>
Sent: 22 June 2023 13:36:26
To: stschiff/msmc2
Cc: Niloofar Alaei Kakhki; Author
Subject: Re: [stschiff/msmc2] using phased-vcf file as input for MSMC2 (Issue #52)
As discussed via email, I think the issue is that your phased VCF is not recognised as being phased. Phased genotypes require a notation like 0|1 or 1|0. If you have 0/1 instead, it is being treated as unphased, leading to combinatorially many combinations and breaking your resulting file in terms of size.
—
Reply to this email directly, view it on GitHub<#52 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANPOBGGLIV4G544GFT3XIOTXMQU3VANCNFSM6AAAAAAZH7BCBU>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
Hi
I want to run the MSMC2 for my dataset which is phased vcf files (multi-sample vcf file with 26 samples) for each chromosome separately (i.e. Chr10.vcf.gz).
I did this process as follows, to use my vcf files as input for running the MSMC2:
First, use the bcftools to produce a separate vcf file for each sample (i.e. sample1.Chr10.vcf.gz).
Second, use the vcfAllSiteParser.py to produce the .bed files.
and then running generate_multihetsep.py to merge VCF and mask files together. *I didn’t do the phasing step, because I supposed that it should include my phasing dataset.
But I received an error in the last step when I ran msmc2 for Estimating the effective population size. I noticed that produced multihetsep.txt files (i.e. Chr10. multihetsep.txt) are too heavy also.
My question is, should I run the phasing step too?
I really appreciate your help in helping me identify the problem.
With the best
Niloo
The text was updated successfully, but these errors were encountered: