-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Assertion `cnt > 0' failed" error in assembly stage 4-unitigger #873
Comments
This has, I think, been fixed, but there is no (feasible) way to use the latest code without restarting your assembly. Instead, copy the two files in the attached tar.gz to src/bogart/, recompile and restart canu. This updates only the part that is failing with the latest algorithm. |
Unfortunately, I get another assertion failed error with the new AS_BAT_OptimizePositions.C as well. Please see below. Is there maybe another fix for this? breakSingletonTigs()-- Removed 112986 singleton tigs; reads are now unplaced. Failed with 'Aborted'; backtrace (libbacktrace): |
Yet another patch, same story. Not many good options after this:
|
Unfortunately, I get another error (_listLen > 0 failed). Should I even try "removing the two lines that say 'optimizePositions' " or will this fail because the list will be 0 anyway? From the log: optimizePositions()-- Updating positions. ==> MERGE ORPHANS. computeErrorProfiles()-- Computing error profiles for 141027 tigs, with 4 threads. findPotentialOrphans()-- working on 141027 tigs. Failed with 'Aborted'; backtrace (libbacktrace): |
What are the details of this assembly? I see a genome size of 430 Mbp. Coverage? Read type? Length? Repeat rich genome? I don't see any obvious changes in the latest code that would fix this. Options:
Option 1 is the fastest, unless it fails again. Option 2 is more likely to succeed, should result in a slightly better assembly, but requires the correction step to run again. I don't want to debug the run you currently have. It'll confuse me too much to work with the old code ("hey, didn't I already fix that?"), and moving any fix from the old code to the latest code is just lost time. |
As you noted, genome size is about 430 Mbp, although we might be off 100Mbp (either side). It is not repeat rich but we expect very high heterozygosity. Coverage is at the edge with about 27x raw and 18x after trimming. Reads are from a Pacbio sequel, length distribution seems ok (see below). We had run a canu assembly previously with "default" settings and got about 350 Mbp, but BUSCO found only very few genes even after a polishing round. We also ran a Masurca assembly (with some added Illumina data we have) and got about 550Mbp with rather high BUSCO gene number but high duplicates rate. Therefore, I wanted to try this additional canu run with higher error rates (correctedErrorRate=0.1) and include as much data as possible (corOutCoverage=1000). I noted that the error correction and trimming took much more time. Which makes sense I guess. And now I am stuck at the assembly step :-( read length distibution:
|
In cases of high heterozygosity (I'm assuming >1%) just think of your genome size as doubled so you really have <15x per haplotype which is quite low. I think the goal should be to get double the genome size and remove the duplication later by post-processing. The polishing will not work well (e.g. Arrow) if you try to smash in both haplotypes into a single consensus. I am not sure a higher corrected error rate will help because you probably want to separate not collapse haplotypes, plus with high heterozygosity most won't be collapsible (e.g. large SV). Sequel data is worse quality than RSII data so it may help compensate for that. My suggestion would be to treat this as a 15x genome in which case you want to turn on all the low coverage options but also add the separate haplotype option: This will compute overlaps to a higher error but still try to be conservative and separate haplotypes where possible. This will likely end up with duplication as well but you can remove the duplication either using self-alignment or gene information. |
Idle, closing, original bug should be fixed in latest code. |
I've just run into what looks like the same issue but with Canu 1.7 (installed via BioConda), failed during the Resuming failed again:
From the end of the
This is with four PacBio Sequel SMRT cells of a WGA amplified sample, expected genome size is the region of 100Mb, but a Canu 1.6 assembly came out at about twice that, which we think is uncollapsed haplotypes and/or population level variation. |
Having seen this Testing with the same dataset against the master branch this worked fine, specifically commit 6f3c375 worked:
|
I am running into a "Assertion `cnt > 0' failed" error in the assembly stage 4-unitigger. This is running on a cluster and so far this assembly ran without problems. I have restarted canu twice but the same error happens. We are using Canu 1.6 on a centos7 cluster with LSF job submission system.
This is from the unitigger.1.out:
Running job 1 based on LSB_JOBINDEX=1 and offset=0.
./unitigger.sh: line 75: 31378 Aborted (core dumped) $bin/bogart -G ../canu_assembly_subreads2.gkpStore -O ../canu_assembly_sub
reads2.ovlStore -o ./canu_assembly_subreads2 -gs 430000000 -eg 0.105 -eM 0.105 -mo 500 -dg 6 -db 6 -dr 3 -ca 2100 -cp 200 -threads 4 -M 20 -una
ssembled 2 0 1.0 0.5 5 > ./unitigger.err 2>&1
./unitigger.sh: line 87: ../canu_assembly_subreads2.ctgStore/seqDB.v001.sizes.txt: No such file or directory
Below is the error message from unitigger.err:
computeErrorProfiles()-- Computing error profiles for 141027 tigs, with 4 threads.
computeErrorProfiles()-- Finished.
placeContains()-- placing 970137 contained and 306780 unplaced reads, with 4 threads.
placeContains()-- Placed 656894 contained reads and 11575 unplaced reads.
placeContains()-- Failed to place 313243 contained reads (too high error suspected) and 295205 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 141027 reads in 1392714 tigs, with 4 threads.
optimizePositions()-- Allocating scratch space for 1392714 reads (0 KB).
optimizePositions()-- Initializing positions with 4 threads.
bogart: bogart/AS_BAT_OptimizePositions.C:140: void Unitig::optimize_initPlace(uint32, optPos*, optPos*, bool, std::set&, bool): Assertion `cnt > 0' failed.
Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::102 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
bogart/AS_BAT_OptimizePositions.C::140 in _ZN6Unitig18optimize_initPlaceEjP6optPosS1_bRSt3setIjSt4lessIjESaIjEEb()
bogart/AS_BAT_OptimizePositions.C::391 in (null)()
bogart/AS_BAT_OptimizePositions.C::379 in ZN9TigVector17optimizePositionsEPKcS1()
bogart/bogart.C::456 in main()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
The text was updated successfully, but these errors were encountered: