-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiC mode #76
Comments
You may also wanna check out pstools for haplotype-aware scaffolding: shilpagarg/DipAsm#16. Unfortunately, at this stage we don't provide hi-c plot feature, but will make available in the next release. Hope this helps! |
Thanks a lot. I will check it out. What is the difference between using pstools from running in hifiasm? |
Pstools can produce chromosome-scale phased assembly. |
Ok. thanks. And the version of hifiasm with HiC integration what it does? |
Hifiasm partitions contigs into two groups with Hi-C. You can run SALSA on partitioned contigs. PS: the major benefit of Hi-C is that you can much longer phased blocks. Also, pstools is a separate project that uses hifiasm output and a different algoroithm. You can try both and see which work better for you. |
As Heng said, hifiasm generates phased contig for now. I believe current hifiasm probably can generate good phased contig. That is, in each contig, the switch error rate and hamming error rate is lowest in my experiment. However, I'm not sure if it can correctly assign contigs to each haplotype. We're still improving this part. But I guess if we already have phased contig, probably it is not too hard to use other information to assign contigs, or even manually adjust by eye? Since there are not too many contigs and contigs are long enough. As for scaffolding, also because of contigs are long enough, getting a not bad scaffolded assembly is not hard. However, it may still have problems at difficult regions. I believe with HiFi in hand, we will also have a nearly prefect scaffolded assembly with totally new algorithms. But it takes time to do that. PS: pstools is not a part of hifiasm. It should use different information and strategies in comparison with hifiasm. As Heng said, you can try both to see which work better for your project. |
thank you guys for the nice tips. I will definitely try them out. I just finished the first run testing the HiC mode and I ended up with hap1 much smaller than hap2 (303mbp x 475mbp), which in fact genome size is around 390mbp. Any idea why? |
Well, that is the assigning problem (i.e., how to assign contigs to each haplotype). We are working on that and hopefully can fix it in a few days. This problem is similar to purge_dups. By the way, do you have any solution to roughly evaluate contig hamming && switch error rate? It would be helpful to get those two numbers. |
Thanks for the update. I wonder if I do purge_dups on the bigger hap assembly and then merge the rest with the other smaller hap... I am not an expert on that but found this way as reported in https://www.nature.com/articles/s41587-020-0719-5 Phasing accuracy estimates hope it helps |
another way from a colleague: We have {parents:pA, pB, child:C} - a trio - for evaluate phasing using k-mers: for child we have the phased assembly, while for parents, we need Illumina short reads. For example, in apricot work: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02235-5: After polishing the assemblies respectively with the “Currot”-genotype and “Orange Red”-genotype PacBio reads using apollo [52], we built up two sets of haplotype-specific k-mers from the assemblies, rC and rO. Correspondingly, a set of “Currot”-specific k-mers (with coverage from 10 to 60x), pC, was selected from the parental Illumina WGS that did not exist in “Orange Red” short reads (coverage over 1x) but in “Rojo Pasión” pollen short reads (coverage from 10 to 300x); similarly, a set of “Orange Red”-specific k-mers, pO, was also collected. Then, we intersected rC and rO with pC and pO respectively, leading to four subsets rC ∩ pC, rC ∩ pO, rO ∩ pC, and rO ∩ pO, which were used to calculate average haplotyping accuracy. All k-mer processing (counting, intersecting and difference finding) were performed with KMC [53]. |
Thank you so much. It is easy to evaluate hamming/switch error rate with trio data. For purge_dups, I do think current tools are problematic, especially for segdups. |
Dear Hifiasm team,
thank you very much for introducing this feature of HiC mode on hifiasm. Before I was running SALSA after assembly.
I want to know what would be the easiest way to make the hic plot after phased diploid assembly with this HiC mode?
Best,
André
The text was updated successfully, but these errors were encountered: