HiC mode #76

dabitz · 2021-02-17T09:58:41Z

Dear Hifiasm team,

thank you very much for introducing this feature of HiC mode on hifiasm. Before I was running SALSA after assembly.
I want to know what would be the easiest way to make the hic plot after phased diploid assembly with this HiC mode?

Best,
André

shilpagarg · 2021-02-17T11:48:00Z

You may also wanna check out pstools for haplotype-aware scaffolding: shilpagarg/DipAsm#16. Unfortunately, at this stage we don't provide hi-c plot feature, but will make available in the next release. Hope this helps!

dabitz · 2021-02-17T11:52:14Z

Thanks a lot. I will check it out. What is the difference between using pstools from running in hifiasm?

shilpagarg · 2021-02-17T11:54:14Z

Pstools can produce chromosome-scale phased assembly.

dabitz · 2021-02-17T11:56:06Z

Ok. thanks. And the version of hifiasm with HiC integration what it does?

lh3 · 2021-02-17T14:19:19Z

Hifiasm partitions contigs into two groups with Hi-C. You can run SALSA on partitioned contigs. PS: the major benefit of Hi-C is that you can much longer phased blocks. Also, pstools is a separate project that uses hifiasm output and a different algoroithm. You can try both and see which work better for you.

chhylp123 · 2021-02-17T14:57:34Z

As Heng said, hifiasm generates phased contig for now. I believe current hifiasm probably can generate good phased contig. That is, in each contig, the switch error rate and hamming error rate is lowest in my experiment. However, I'm not sure if it can correctly assign contigs to each haplotype. We're still improving this part. But I guess if we already have phased contig, probably it is not too hard to use other information to assign contigs, or even manually adjust by eye? Since there are not too many contigs and contigs are long enough.

As for scaffolding, also because of contigs are long enough, getting a not bad scaffolded assembly is not hard. However, it may still have problems at difficult regions. I believe with HiFi in hand, we will also have a nearly prefect scaffolded assembly with totally new algorithms. But it takes time to do that.

PS: pstools is not a part of hifiasm. It should use different information and strategies in comparison with hifiasm. As Heng said, you can try both to see which work better for your project.

dabitz · 2021-02-18T08:52:06Z

thank you guys for the nice tips. I will definitely try them out. I just finished the first run testing the HiC mode and I ended up with hap1 much smaller than hap2 (303mbp x 475mbp), which in fact genome size is around 390mbp. Any idea why?

chhylp123 · 2021-02-18T15:53:06Z

Well, that is the assigning problem (i.e., how to assign contigs to each haplotype). We are working on that and hopefully can fix it in a few days. This problem is similar to purge_dups. By the way, do you have any solution to roughly evaluate contig hamming && switch error rate? It would be helpful to get those two numbers.

dabitz · 2021-02-19T08:25:09Z

Thanks for the update. I wonder if I do purge_dups on the bigger hap assembly and then merge the rest with the other smaller hap...

I am not an expert on that but found this way as reported in https://www.nature.com/articles/s41587-020-0719-5

Phasing accuracy estimates
To evaluate phasing accuracy, we determined SNVs in our phased assemblies based on their alignments to GRCh38. This procedure is described in the ‘SV, indel and SNV detection’ section in the Methods. We evaluate phasing accuracy of our assemblies in comparison to trio-based phasing for HG00733 (ref. 19) and NA12878 (ref. 46). In all calculations, we compare only SNV positions that are shared between our SNV calls and those from trio-based phasing. To count the number of switch errors between our phased assemblies and trio-based phasing, we compare all neighboring pairs of SNVs along each haplotype and recode them into a string of 0s and 1s depending on whether the neighboring alleles are the same (0) or not (1). The absolute number of differences in such binary strings is counted between our haplotypes and the trio-based haplotypes (per chromosome). The switch error rate is reported as a fraction of counted differences of the total number of compared SNVs (per haplotype). Similarly, we calculate the Hamming distance as the absolute number of differences between our SNVs and trio-based phasing (per chromosome) and report it as a fraction of the total number of differences to the total number of compared SNVs (per haplotype).

hope it helps

dabitz · 2021-02-19T08:53:36Z

another way from a colleague:

We have {parents:pA, pB, child:C} - a trio - for evaluate phasing using k-mers: for child we have the phased assembly, while for parents, we need Illumina short reads.

For example, in apricot work: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02235-5:

After polishing the assemblies respectively with the “Currot”-genotype and “Orange Red”-genotype PacBio reads using apollo [52], we built up two sets of haplotype-specific k-mers from the assemblies, rC and rO. Correspondingly, a set of “Currot”-specific k-mers (with coverage from 10 to 60x), pC, was selected from the parental Illumina WGS that did not exist in “Orange Red” short reads (coverage over 1x) but in “Rojo Pasión” pollen short reads (coverage from 10 to 300x); similarly, a set of “Orange Red”-specific k-mers, pO, was also collected. Then, we intersected rC and rO with pC and pO respectively, leading to four subsets rC ∩ pC, rC ∩ pO, rO ∩ pC, and rO ∩ pO, which were used to calculate average haplotyping accuracy. All k-mer processing (counting, intersecting and difference finding) were performed with KMC [53].

chhylp123 · 2021-02-23T00:23:48Z

Thank you so much. It is easy to evaluate hamming/switch error rate with trio data. For purge_dups, I do think current tools are problematic, especially for segdups.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiC mode #76

HiC mode #76

dabitz commented Feb 17, 2021

shilpagarg commented Feb 17, 2021

dabitz commented Feb 17, 2021

shilpagarg commented Feb 17, 2021

dabitz commented Feb 17, 2021

lh3 commented Feb 17, 2021 •

edited

Loading

chhylp123 commented Feb 17, 2021 •

edited

Loading

dabitz commented Feb 18, 2021

chhylp123 commented Feb 18, 2021

dabitz commented Feb 19, 2021

dabitz commented Feb 19, 2021

chhylp123 commented Feb 23, 2021

HiC mode #76

HiC mode #76

Comments

dabitz commented Feb 17, 2021

shilpagarg commented Feb 17, 2021

dabitz commented Feb 17, 2021

shilpagarg commented Feb 17, 2021

dabitz commented Feb 17, 2021

lh3 commented Feb 17, 2021 • edited Loading

chhylp123 commented Feb 17, 2021 • edited Loading

dabitz commented Feb 18, 2021

chhylp123 commented Feb 18, 2021

dabitz commented Feb 19, 2021

dabitz commented Feb 19, 2021

chhylp123 commented Feb 23, 2021

lh3 commented Feb 17, 2021 •

edited

Loading

chhylp123 commented Feb 17, 2021 •

edited

Loading