Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogart findPotentialOrphan assertion failing #1872

Closed
ASLeonard opened this issue Jan 6, 2021 · 16 comments
Closed

Bogart findPotentialOrphan assertion failing #1872

ASLeonard opened this issue Jan 6, 2021 · 16 comments

Comments

@ASLeonard
Copy link

Hi,
I'm using canu snapshot v2.2-development +82 changes (af771ef) on a linux system, and have issues with the bogart step of unitigging. This is using about 45x ONT reads on a 2.7gb mammal genome.

An assertion is failing in findPotentialOrphans, which is similar to the (idle) unresolved issue in #1831. Unfortunately I can't share the sequencing data at this time, as asked for in that issue. Using the ovl algorithm was proving to be extremely slow, so I tried using mhap even for trimming and tigging, which is not recommended in the docs, but was suggest by some colleagues in the USDA. I wasn't sure if that approximate algorithm could be the cause of the failed assertion, or if there is potentially some issue upstream.

I've included final lines in the unitigger.err file below.

==> MERGE ORPHANS.

computeErrorProfiles()-- Computing error profiles for 4946 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

findPotentialOrphans()-- working on 4946 tigs.
findPotentialOrphans()-- found 4218 potential orphans.
mergeOrphans()-- flagged     146        bubble tigs with 8494 reads
mergeOrphans()-- placed       13 unique orphan tigs with 228 reads
mergeOrphans()-- shattered     2 repeat orphan tigs with 80 reads
mergeOrphans()-- ignored      34               tigs with 902 reads; failed to place
mergeOrphans()--

==> MARK SIMPLE BUBBLES.
    using 0.010000 user-specified threshold


findPotentialOrphans()-- working on 4946 tigs.
read 845459 at 802795 743355 olap to read 1331274 hangs 61633 17804 -> coords 743355 741162
bogart: bogart/AS_BAT_MergeOrphans.C:229: void findPotentialOrphans(TigVector&, BubTargetList&, bool): Assertion `mincoord < maxcoord' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
(null)::0 in (null)()

Thanks,
Alex

PS I think there is a typo in the .trimReads.log files in the 3-overlapbasedtrimming stage, it uses NOV for the message column where the header suggests NOC for no change. It appears here.

@skoren
Copy link
Member

skoren commented Jan 14, 2021

Without having the data it's going to be hard to diagnose/fix. Are you able to share the data if you remove the actual sequences or is that not acceptable either (we'd need the seqStore but remove all the blobs files and ovlStore)? The FAQ has instructions on sending data to us.

@ASLeonard
Copy link
Author

I've uploaded the seqStore and ovlStore to your /incoming/sergek. There were errors raised after the put commands finished, but after retrying it said the files exist and are the same. If it didn't work, let me know and I can try again. They are named under issue_1872_*.tar.gz.

@ASLeonard
Copy link
Author

@skoren, did you get the data okay? Just wanted to follow up before I removed the tar files on my end.

@skoren
Copy link
Member

skoren commented Jan 21, 2021

Yep I was able to get the data.

@brianwalenz
Copy link
Member

Apologies, you caught me in the midst of many holidays and much time off. I'm looking at it now. If you have a chance, can you post the bogart command? It's in unitigging/4-unitigger/unitigger.sh.

@ASLeonard
Copy link
Author

if [ $exists = false ] ; then
  $bin/bogart \
    -S ../../asm.seqStore \
    -O    ../asm.ovlStore \
    -o     ./asm \
    -gs 2700000000 \
    -eg 0.144 \
    -eM 0.144 \
    -mo 500 \
    -covgapolap 500 \
    -covgaptype deadend \
    -lopsided 25  \
    -minolappercent   0.0  \
    -dg 12 \
    -db 1 \
    -dr 1 \
    -ca 2500 \
    -cp 15 \
    -threads 32 \
    -M 100 \
    -unassembled 2 0 1.0 0.5 3 \
    > ./unitigger.err 2>&1 \
  && \
  mv ./asm.ctgStore ../asm.ctgStore
fi

if [ ! -e ../asm.ctgStore ] ; then
  echo bogart appears to have failed.  No asm.ctgStore found.
  exit 1
fi

if [ ! -e ../asm.ctgStore/seqDB.v001.sizes.txt ] ; then
  $bin/tgStoreDump \
    -S ../../asm.seqStore \
    -T ../asm.ctgStore 1 \
    -sizes -s 2700000000 \
   > ../asm.ctgStore/seqDB.v001.sizes.txt
fi

@brianwalenz
Copy link
Member

Thanks! My guesses were close.

There might be a corrupt overlap file. I'm getting this error almost immediately:

OverlapCache()-- Loading overlaps.
OverlapCache()--
OverlapCache()--          read from store           saved in cache
OverlapCache()--   ------------ ---------   ------------ ---------
bogart: stores/ovStoreFile.C:410: bool ovFile::readOverlap(ovOverlap*): Assertion `_bufferPos <= _bufferLen' failed.

Can you verify that this is correct? In particular, the above error indicates a truncated file, and file 0001-004 looks suspiciously small (it's also the file being read when it fails).

issue_1872.ovlStore> ls -l 0001-00? evalues index info statistics 
-rw-r----- 1 walenzbp Phillippy  1073882360 Jan 15 06:39 0001-001
-rw-r----- 1 walenzbp Phillippy  1073771580 Jan 15 06:15 0001-002
-rw-r----- 1 walenzbp Phillippy  1074063980 Jan 15 06:20 0001-003
-rw-r----- 1 walenzbp Phillippy   157745152 Jan 15 06:57 0001-004
-rw-r----- 1 walenzbp Phillippy  1075859680 Jan 15 04:29 0001-005
-rw-r----- 1 walenzbp Phillippy  1073748980 Jan 15 04:25 0001-006
-rw-r----- 1 walenzbp Phillippy  1074047140 Jan 15 05:30 0001-007
-rw-rw---- 1 walenzbp Phillippy  1074098340 Jan  4 23:14 0001-008
-rw-r----- 1 walenzbp Phillippy  1073879540 Jan 15 05:35 0001-009
-rw-rw---- 1 walenzbp Phillippy 16577801140 Jan  5 18:07 evalues
-rw-r----- 1 walenzbp Phillippy    46652040 Jan 15 06:54 index
-rw-r----- 1 walenzbp Phillippy          40 Jan 15 06:08 info
-rw-r----- 1 walenzbp Phillippy   124405452 Jan 15 05:00 statistics
issue_1872.ovlStore> md5sum 0001-00? evalues index info statistics 
7f62079a19556e93a47f7169d08136c4  0001-001
c6e6191ce7bcf00108c90812f16ef8cf  0001-002
494d0ddcd38787d58cc45bb021563242  0001-003
1970b29845c27182c452e73cbea85891  0001-004
3191eb1834f7d805db850816f84b11fd  0001-005
24a8aff8172f5deda8f4f408bb6d1133  0001-006
f4f592186db9622c6565405ad7ef5e26  0001-007
b35bf558760bb11d44b75c123709c40a  0001-008
e0dd0daf34442224b9d9c95d71ae895c  0001-009
6d733ed7a87db9f3cc9d2f3f522e5dfb  evalues
9352b62547ea90f8a56ecc1b202fcc20  index
d9c9679ee249d5d7426829c6b9cdfb89  info
a7f9081446975386651e3fc7d1b9de68  statistics

@ASLeonard
Copy link
Author

Looks the same on my end. I just noticed all the intermediate overlap results get removed at some point, so I can't check if something went wrong there.

@brianwalenz
Copy link
Member

Quite surprising! That file is definitely truncated. Reads 39492 through 48674 (inclusive) have lost their overlaps. You can check yourself: ovStoreDump -S issue_1872.seqStore -O issue_1872.ovlStore -overlaps 48675 will report 75 overlaps, while 48674 will fail. Read 48674 is the last one stored in file 0001-004; read 48675 is the first one stored in 0001-005.

There is redundancy in the way we store overlaps and this isn't as fatal as it sounds. I've hacked bogart to ignore these reads when loading overlaps. It should recover the overlaps through the redundant copies - but this has never been explicitly tested before.

You might want to investigate what happened to this file between your first (crashed) run and the data you have now. I've verified that the .gz files I have are complete and unpack without error. Coincidentally, this file has the most recent time stamp of all files, hinting that a copy might have been interrupted.

> ls -ltr issue_1872.ovlStore/ | tail
-rw-r----- 1 walenzbp Phillippy  1073889040 Jan 15 06:52 0003-001
-rw-r----- 1 walenzbp Phillippy  1073877060 Jan 15 06:54 0001-021
-rw-r----- 1 walenzbp Phillippy    46652040 Jan 15 06:54 index
-rw-r----- 1 walenzbp Phillippy  1074498900 Jan 15 06:55 0005-007
-rw-r----- 1 walenzbp Phillippy  1073923500 Jan 15 06:57 0002-013
-rw-r----- 1 walenzbp Phillippy   157745152 Jan 15 06:57 0001-004
drwxrws--- 2 walenzbp Phillippy        4096 Jan 20 08:38 scripts
drwxrws--- 2 walenzbp Phillippy        4096 Jan 20 08:38 logs

(the two directories have timestamps from when I unpacked the data)

@ASLeonard
Copy link
Author

I went back to the original ovlStore, and found there were different values.

-rw-rw---- 1 alleonard hest-hpc-tg  1074020260 Jan  5 05:14 0001-004
372662fc88350e1eacfc2bbc835dd9bb  0001-004

Maybe it was during the tarring and transferring that something went off. I'm making a fresh tar of the ovlStore and will send that when ready.

@brianwalenz
Copy link
Member

For now, just send that one file. The others looked ok. Thanks for looking!

I was able to reproduce a crash with my hacked up version.

@ASLeonard
Copy link
Author

Uploaded that 0001-004 for completeness.

@brianwalenz
Copy link
Member

Did you use -fast or just set overlapper=mhap?

The former will run mhap and then 'overlapPair' to remove garbage overlaps. It looks like just overlapper=mhap was set, leaving a lot of invalid (especially for bogart) overlaps. I'm quite surprised canu made it as far as it did.

The trimmed reads are still good, and I suggest starting a new assembly from those: -fast -trimmed -nanopore trimmedReads.fasta.gz will create a new seqStore then jump straight to computing unitigging/0-mercounts then unitigging/1-overlapper. The overlap jobs will take longer than before due to the realignment with overlapPair.

@ASLeonard
Copy link
Author

It was manually with -obtOverlapper=mhap -utgOverlapper=mhap. Would the trimmed reads still be good in that case?

@brianwalenz
Copy link
Member

Yes, most definitely!

@brianwalenz
Copy link
Member

Fixed (finally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants