Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Assertion `cnt > 0' failed" error in assembly stage 4-unitigger #873

Closed
gdczoller opened this issue Apr 17, 2018 · 10 comments
Closed

"Assertion `cnt > 0' failed" error in assembly stage 4-unitigger #873

gdczoller opened this issue Apr 17, 2018 · 10 comments

Comments

@gdczoller
Copy link

I am running into a "Assertion `cnt > 0' failed" error in the assembly stage 4-unitigger. This is running on a cluster and so far this assembly ran without problems. I have restarted canu twice but the same error happens. We are using Canu 1.6 on a centos7 cluster with LSF job submission system.

This is from the unitigger.1.out:
Running job 1 based on LSB_JOBINDEX=1 and offset=0.
./unitigger.sh: line 75: 31378 Aborted (core dumped) $bin/bogart -G ../canu_assembly_subreads2.gkpStore -O ../canu_assembly_sub
reads2.ovlStore -o ./canu_assembly_subreads2 -gs 430000000 -eg 0.105 -eM 0.105 -mo 500 -dg 6 -db 6 -dr 3 -ca 2100 -cp 200 -threads 4 -M 20 -una
ssembled 2 0 1.0 0.5 5 > ./unitigger.err 2>&1
./unitigger.sh: line 87: ../canu_assembly_subreads2.ctgStore/seqDB.v001.sizes.txt: No such file or directory

Below is the error message from unitigger.err:

computeErrorProfiles()-- Computing error profiles for 141027 tigs, with 4 threads.
computeErrorProfiles()-- Finished.

placeContains()-- placing 970137 contained and 306780 unplaced reads, with 4 threads.
placeContains()-- Placed 656894 contained reads and 11575 unplaced reads.
placeContains()-- Failed to place 313243 contained reads (too high error suspected) and 295205 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 141027 reads in 1392714 tigs, with 4 threads.
optimizePositions()-- Allocating scratch space for 1392714 reads (0 KB).
optimizePositions()-- Initializing positions with 4 threads.
bogart: bogart/AS_BAT_OptimizePositions.C:140: void Unitig::optimize_initPlace(uint32, optPos*, optPos*, bool, std::set&, bool): Assertion `cnt > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::102 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
bogart/AS_BAT_OptimizePositions.C::140 in _ZN6Unitig18optimize_initPlaceEjP6optPosS1_bRSt3setIjSt4lessIjESaIjEEb()
bogart/AS_BAT_OptimizePositions.C::391 in (null)()
bogart/AS_BAT_OptimizePositions.C::379 in ZN9TigVector17optimizePositionsEPKcS1()
bogart/bogart.C::456 in main()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()

@brianwalenz
Copy link
Member

This has, I think, been fixed, but there is no (feasible) way to use the latest code without restarting your assembly. Instead, copy the two files in the attached tar.gz to src/bogart/, recompile and restart canu. This updates only the part that is failing with the latest algorithm.
fix.tar.gz

@gdczoller
Copy link
Author

Unfortunately, I get another assertion failed error with the new AS_BAT_OptimizePositions.C as well. Please see below. Is there maybe another fix for this?

breakSingletonTigs()-- Removed 112986 singleton tigs; reads are now unplaced.
optimizePositions()-- Optimizing read positions for 1392714 reads in 141027 tigs, with 4 threads.
optimizePositions()-- Allocating scratch space for 1392714 reads (87044 KB).
optimizePositions()-- Initializing positions with 4 threads.
bogart: bogart/AS_BAT_OptimizePositions.C:194: void Unitig::optimize_initPlace(uint32, optPos*, optPos*, bool, std::set&, bool): Assertion `cnt > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::102 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
bogart/AS_BAT_OptimizePositions.C::194 in _ZN6Unitig18optimize_initPlaceEjP6optPosS1_bRSt3setIjSt4lessIjESaIjEEb()
bogart/AS_BAT_OptimizePositions.C::453 in (null)()
../../../gcc-4.8.2/libgomp/team.c::115 in gomp_thread_start()

@brianwalenz
Copy link
Member

Yet another patch, same story.

fix2.tar.gz

Not many good options after this:

  • disabling 'optimizePositions' entirely would obviously stop the crash, but might just lead to failure later. I've never tried this. You can do this by removing the two lines that say 'optimizePositions' from bogart/bogart.C.
  • upgrading to the latest github tip code and restarting from trimmed reads (canu ... -d newassembly -assemble -pacbio-corrected oldassembly/*.trimmedReads.fasta.gz). If it still fails, I'd need either the trimmed reads or the two 'stores' from the assembly so I can debug.

@gdczoller
Copy link
Author

Unfortunately, I get another error (_listLen > 0 failed). Should I even try "removing the two lines that say 'optimizePositions' " or will this fail because the list will be 0 anyway?

From the log:

optimizePositions()-- Updating positions.
optimizePositions()-- Finished.

==> MERGE ORPHANS.

computeErrorProfiles()-- Computing error profiles for 141027 tigs, with 4 threads.
computeErrorProfiles()-- Finished.

findPotentialOrphans()-- working on 141027 tigs.
mergeOrphans()-- Found 14136 potential orphans.
mergeOrphans()-- placed 24 unique orphan tigs
mergeOrphans()-- shattered 7 repeat orphan tigs
mergeOrphans()--
bogart: AS_UTL/intervalList.H:731: void intervalList<iNum, iVal>::computeDepth(intervalDepthRegions<iNum, iVal>*, uint32) [with iNum = int; iVal = int; uint32 = unsigned int]: Assertion `_listLen > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::102 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
AS_UTL/intervalList.H::731 in _ZN12intervalListIiiE12computeDepthEP20intervalDepthRegionsIiiEj()
AS_UTL/intervalList.H::626 in ZN12intervalListIiiE5depthERS0()
AS_UTL/intervalList.H::114 in intervalList()
bogart/AS_BAT_Instrumentation.C::214 in _Z13classifyRule4P6UnitigP8_IO_FILERjRmdj()
bogart/AS_BAT_Instrumentation.C::295 in _Z25classifyTigsAsUnassembledR9TigVectorjjddj()
bogart/bogart.C::488 in main()
(null)::0 in (null)()

@brianwalenz
Copy link
Member

What are the details of this assembly? I see a genome size of 430 Mbp. Coverage? Read type? Length? Repeat rich genome?

I don't see any obvious changes in the latest code that would fix this.

Options:

  1. restart from trimmed reads with the latest 'tip' code. I doubt this will fix it, but you'll at least be at a point where I could debug. I would need to get the two unitigger/*Store/ directories. Alternatively, I could rerun from the trimmedReads.fasta.gz and (probably) send you the results.
  2. restart from original reads with the latest 'tip' code. There have been improvements to read correction since 1.6, and countless other fixes/changes.

Option 1 is the fastest, unless it fails again. Option 2 is more likely to succeed, should result in a slightly better assembly, but requires the correction step to run again.

I don't want to debug the run you currently have. It'll confuse me too much to work with the old code ("hey, didn't I already fix that?"), and moving any fix from the old code to the latest code is just lost time.

@gdczoller
Copy link
Author

gdczoller commented Apr 19, 2018

As you noted, genome size is about 430 Mbp, although we might be off 100Mbp (either side). It is not repeat rich but we expect very high heterozygosity. Coverage is at the edge with about 27x raw and 18x after trimming. Reads are from a Pacbio sequel, length distribution seems ok (see below). We had run a canu assembly previously with "default" settings and got about 350 Mbp, but BUSCO found only very few genes even after a polishing round. We also ran a Masurca assembly (with some added Illumina data we have) and got about 550Mbp with rather high BUSCO gene number but high duplicates rate. Therefore, I wanted to try this additional canu run with higher error rates (correctedErrorRate=0.1) and include as much data as possible (corOutCoverage=1000). I noted that the error correction and trimming took much more time. Which makes sense I guess. And now I am stuck at the assembly step :-(

read length distibution:

CORRECTION/READS
-- In gatekeeper store 'correction/canu_assembly_subreads2.gkpStore':
--   Found 1320000 reads.
--   Found 11943997974 bases (27.77 times coverage).
--   Read length histogram (one '*' equals 1615.87 reads):
--        0    999      0 
--     1000   1999 113111 **********************************************************************
--     2000   2999 103210 ***************************************************************
--     3000   3999  92264 *********************************************************
--     4000   4999  87684 ******************************************************
--     5000   5999  82685 ***************************************************
--     6000   6999  76480 ***********************************************
--     7000   7999  70211 *******************************************
--     8000   8999  65564 ****************************************
--     9000   9999  63442 ***************************************
--    10000  10999  69768 *******************************************
--    11000  11999  82394 **************************************************
--    12000  12999  86425 *****************************************************
--    13000  13999  76329 ***********************************************
--    14000  14999  61249 *************************************
--    15000  15999  47001 *****************************
--    16000  16999  36122 **********************
--    17000  17999  27041 ****************
--    18000  18999  20183 ************
--    19000  19999  15250 *********
--    20000  20999  11315 *******
--    21000  21999   8199 *****
--    22000  22999   6082 ***
--    23000  23999   4583 **
--    24000  24999   3350 **
--    25000  25999   2525 *
--    26000  26999   1794 *
--    27000  27999   1396 

@skoren
Copy link
Member

skoren commented Apr 19, 2018

In cases of high heterozygosity (I'm assuming >1%) just think of your genome size as doubled so you really have <15x per haplotype which is quite low. I think the goal should be to get double the genome size and remove the duplication later by post-processing. The polishing will not work well (e.g. Arrow) if you try to smash in both haplotypes into a single consensus.

I am not sure a higher corrected error rate will help because you probably want to separate not collapse haplotypes, plus with high heterozygosity most won't be collapsible (e.g. large SV). Sequel data is worse quality than RSII data so it may help compensate for that.

My suggestion would be to treat this as a 15x genome in which case you want to turn on all the low coverage options but also add the separate haplotype option:
corMinCoverage=0 corMhapSensitivity=high correctedErrorRate=0.105 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"

This will compute overlaps to a higher error but still try to be conservative and separate haplotypes where possible. This will likely end up with duplication as well but you can remove the duplication either using self-alignment or gene information.

@skoren
Copy link
Member

skoren commented May 11, 2018

Idle, closing, original bug should be fixed in latest code.

@skoren skoren closed this as completed May 11, 2018
@skoren skoren mentioned this issue May 28, 2018
@peterjc
Copy link

peterjc commented May 29, 2018

I've just run into what looks like the same issue but with Canu 1.7 (installed via BioConda), failed during the bogart step. Should I open a new issue? Re-try with the latest canu from github?

Resuming failed again:

-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Mon May 28 15:17:33 2018 with 39325.076 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Mon May 28 16:33:56 2018 (4583 seconds) with 38939.622 GB free disk space
----------------------------------------
--
-- Bogart failed, retry
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Mon May 28 16:33:56 2018 with 38939.622 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Mon May 28 18:02:11 2018 (5294 seconds) with 38553.341 GB free disk space
----------------------------------------
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

From the end of the unitigging/4-unitigger/unitigger.err output:

==> PLACE CONTAINED READS.

computeErrorProfiles()-- Computing error profiles for 23524 tigs, with 8 threads.
computeErrorProfiles()-- Finished.

placeContains()-- placing 601637 contained and 4047944 unplaced reads, with 8 threads.
placeContains()-- Placed 449472 contained reads and 2803 unplaced reads.
placeContains()-- Failed to place 152165 contained reads (too high error suspected) and 4045141 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 4722545 reads in 23524 tigs, with 8 threads.
optimizePositions()--   Allocating scratch space for 4722545 reads (295159 KB).
optimizePositions()--   Initializing positions with 8 threads.
bogart: bogart/AS_BAT_OptimizePositions.C:142: void Unitig::optimize_initPlace(uint32, optPos*, optPos*, bool, std::set<unsigned int>&, bool): Assertion `cnt > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()

This is with four PacBio Sequel SMRT cells of a WGA amplified sample, expected genome size is the region of 100Mb, but a Canu 1.6 assembly came out at about twice that, which we think is uncollapsed haplotypes and/or population level variation.

@peterjc
Copy link

peterjc commented Jun 5, 2018

Having seen this optimize_initPlace Segmentation fault in Canu 1.7 from BioConda, I have now reproduced it with Canu 1.7 from git compiled locally.

Testing with the same dataset against the master branch this worked fine, specifically commit 6f3c375 worked:

Canu snapshot v1.7 +243 changes (r8935 6f3c37525c6bf532be8f585daeaf565507c4c3b1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants