-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory: core dumped at haplotype_scaffold #20
Comments
Thanks for pointing this. Yes, I have seen the invalid pointer error in non-human assemblies. I am working on this and will provide an update soon. |
Works, thanks! I get 242 Mb of hap1 and 32 Mb of hap2, and 62 Mb in broken_nodes. hap1 is much bigger than expected, there may be some bacterial contamination in there. I have genetic map-based pseudochromosomes from other assemblies, so I'll go through these files to see what looks sensible. Also, I guess you plan to get to this eventually, can you say something about what pstools is doing relative to the previous docker pipeline? Is there good reason not to use the primary hifiasm contigs (or other assemblies), rather than the raw unitigs? |
Good to know. The pstools method is purely graph-based without any haplotype collapses and enables routine production of phased sequences. I will be happy to help further if you could send me an email. As I mentioned, I only tested for humans, but it will be interesting to see for other genomes. Working on unitigs is better than contigs to avoid any random cross-chromosome or long-range chromosome connections. Instead, Hi-C information is powerful to disentangle such cases in the graph. |
Yes, I agree with that it depends on characteristics of genome. Specifically, Hi-C is helpful for genomes with complex centromeres, for example, humans. For small genomes with no centromeres, I understand HiFi would be good enough. Another aspect is cost-effective. IMO there is no generalized method that is best for every genome. |
Hi Shilpa,
I'm running hifiasm/pstools as in #16 on an ~100Mb genome, expected to be mostly haploid. I'm assuming this shouldn't be a major issue? I don't really trust the base-level results of short-read HiC assembly/scaffolding on HiFi tigs, and I'm hoping DipAsm will do a better job of it.
I get through with some minor (I assume) complaints (a few
ERROR: key not in position table
during hic_mapping_haplo, and variousrm
errors during resolve_haplotypes), but then a core dump during the haplotype_scaffold stage. There are 56 utgs for each of hap1 and hap2 in pred_haplotypes.fa, each ~250Mb. Any thoughts?Here's that log:
start main
All above 5M: 13
All above 1.5M: 44
Update best buddy score.
Get potential connections 4.
Insert connections.
Save graphs and scores.
Nodes in graph: 2.
Left edges: 376.
Update best buddy score.
Get potential connections 4.
Insert connections.
Save graphs and scores.
Nodes in graph: 4.
Left edges: 184.
Update best buddy score.
Update best buddy score.
Get potential connections 4.
Insert connections.
Save graphs and scores.
Nodes in graph: 5.
Left edges: 304.
Update best buddy score.
Update best buddy score.
Finish get first scaffolds.
free(): invalid pointer
The text was updated successfully, but these errors were encountered: