flo failed on Large genome #33

pan-genome · 2020-09-18T15:18:22Z

flo failed on a 14Gb genome, with "corrupted double-linked list (not small)" error. it runs normally with genome smaller than 4Gb in size. The setting is on an aws m5.16xlarge EC2 instance.

rake -f /home/ubuntu/flo/Rakefile &
mkdir run
cp /home/ubuntu/s.fa run/source.fa
cp /home/ubuntu/t.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 21 run/chunk_
parallel --joblog run/joblog.faSplit -j 21 -a run/joblst.faSplit
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

123322 pieces of 123923 written
133957 pieces of 134763 written
150983 pieces of 152743 written
156478 pieces of 157558 written
98419 pieces of 99073 written
99082 pieces of 99724 written
103154 pieces of 103663 written
113555 pieces of 113991 written
118767 pieces of 119728 written
123551 pieces of 124526 written
141741 pieces of 142672 written
144495 pieces of 146237 written
130388 pieces of 131310 written
147572 pieces of 148896 written
138549 pieces of 140111 written
141907 pieces of 142961 written
149246 pieces of 150844 written
149613 pieces of 150822 written
197774 pieces of 198899 written
160747 pieces of 162550 written
167525 pieces of 170389 written
parallel --joblog run/joblog.blat -j 21 -a run/joblst.blat
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

corrupted double-linked list (not small)
free(): invalid next size (normal)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
malloc(): smallbin double linked list corrupted
free(): invalid next size (normal)
malloc(): memory corruption
free(): invalid next size (normal)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
rake aborted!
Command failed with status (21): [parallel --joblog run/joblog.blat -j 21 -a...]
/home/ubuntu/flo/Rakefile:153:in parallel' /home/ubuntu/flo/Rakefile:99:in block in <top (required)>'
/home/ubuntu/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

[1]+ Exit 1 rake -f /home/ubuntu/flo/Rakefile

The text was updated successfully, but these errors were encountered:

yeban · 2020-09-20T15:37:19Z

Not sure if the error is coming from GNU parallel or blat. The contents of run/joblog.blat can help decide. Would you mind posting it?

If it's GNU parallel, you could try using a newer version. The version that the install script installs is quite old.

If it's blat, it is possible that 256 GB is not sufficient memory for the task. Did you monitor the memory usage using htop?
You could try lowering the number of parallel processes that flo runs, use a memory optimised (r5) instance for more RAM, and take steps to minimise memory usage of blat, such a create and provide an ooc file.

pan-genome · 2020-09-20T16:16:13Z

here is blat joblog:
run$ cat joblog.blat
Seq Host Starttime 6 : 1600440618.782 9 : 1600440618.787 5 : 1600440618.780 21 : 1600440618.807 4 : 1600440618.778 10 : 1600440618.788 2 : 1600440618.775 8 : 1600440618.785 13 : 1600440618.793 14 : 1600440618.795 20 : 1600440618.805 12 : 1600440618.791 7 : 1600440618.783 11 : 1600440618.790 15 : 1600440618.796 19 : 1600440618.803 17 : 1600440618.800 18 : 1600440618.802 16 : 1600440618.798 1 : 1600440618.774 3 : 1600440618.777 JobRuntime Send Receive Exitval Signal Command
60.508 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl
67.600 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl
74.621 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_14.fa run/chunk_14.fa.psl
81.061 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_20.fa run/chunk_20.fa.psl
186.198 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl
312.954 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_10.fa run/chunk_10.fa.psl
312.980 0 41 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl
314.005 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl
314.322 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl
314.361 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_17.fa run/chunk_17.fa.psl
314.427 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl
319.748 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_13.fa run/chunk_13.fa.psl
324.924 0 48 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_18.fa run/chunk_18.fa.psl
327.304 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl
330.322 0 28 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl
331.255 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_11.fa run/chunk_11.fa.psl
332.427 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_15.fa run/chunk_15.fa.psl
332.598 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_19.fa run/chunk_19.fa.psl
333.617 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl
341.095 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_12.fa run/chunk_12.fa.psl
345.338 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_16.fa run/chunk_16.fa.psl

pan-genome · 2020-09-20T16:22:53Z

I was wondering what would be the best way to update parallel, do I install an new version or update the one in /ext/parallel-20150722?
if installed new one in different folder, I then need to point all the parallel in flo to the new src.

yeban · 2020-09-20T16:26:06Z

if installed new one in different folder, I then need to point all the parallel in flo to the new src

Best to install in new folder. You can tell flo about the new folder using :add_to_path: key in the config file.

pan-genome · 2020-09-20T17:03:54Z

changed to r5.16xlarge and used a new parallel, lower the parallel from 21 to 10 and still get the same error. any suggestion? Thanks!
The blatlog looks like below:
run$ cat joblog.blat
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
5 : 1600620779.307 255.175 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl
7 : 1600620779.310 255.858 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl
1 : 1600620779.302 256.565 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl
8 : 1600620779.311 256.630 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl
10 : 1600620779.314 256.855 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl
2 : 1600620779.303 257.506 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl
4 : 1600620779.306 257.615 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl
6 : 1600620779.308 257.718 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl
9 : 1600620779.312 258.359 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl
3 : 1600620779.304 258.777 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl

yeban · 2020-09-26T13:42:31Z

Sorry, I am not quite sure what is happening here. I have not encountered this error before. From the information we have in this thread, it might as well be a bug in blat. It might be worth trying to run the blat commands listed in joblst.blat one by one to check if all the chunks fail with the above error, or one in particular. With an isolated example it might then be worth asking on blat's mailing list.

Just to be sure, is it possible that the ooc file you constructed is using a different tileSize than what you are using for running blat? I guess not, because you have _12 suffix on the ooc file.

Did you compile blat yourself or did you download pre-compiled executable (e.g., using the install script)? It is possible that a difference in glibc between your instance and the host on which blat was compiled. In which case, compiling blat yourself can help. But this is a kind of issue where you would be better off getting help on blat's mailing list.

I used flo on ~400 Mb genome, split into 40 chunks, so 10 Mb per chunk. I wonder if increasing the number of processes so that each chunk is smaller helps.

Lastly, I would quickly check the fasta and psl file for each chunk just to make sure we are not missing something too obvious.

pan-genome · 2020-09-29T20:24:08Z

Hi
How can you split the processes into more than the number of chromosomes/scaffolds? in the information page it says
"Number of CPU cores to use (required - not auto detected). This
cannot be greater than the number of scaffolds in the target assembly."
here I have 21 chromosomes and 21 processes is the max I can get, and looks like it is a memory issue for blat and each chunk is still too big for blat to handle.

pan-genome · 2020-09-30T17:45:10Z

here is what happened when I run blat on one chunk:
blat -noHead -fastMap -tileSize=12 -ooc=4461n_12.ooc -minScore=100 -minIdentity=98 source.fa chunk_08.fa chunk_08.fa.psl
Loaded 14547261565 letters in 22 sequences
free(): invalid next size (normal)
Aborted (core dumped)

akshaya-v · 2022-03-16T15:08:52Z

Hello !
I am facing a similar issue while running flo with a large genome of ~16 Gb size.
Can you please advise if there is a work-around/solution for this issue.
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flo failed on Large genome #33

flo failed on Large genome #33

pan-genome commented Sep 18, 2020

yeban commented Sep 20, 2020

pan-genome commented Sep 20, 2020

pan-genome commented Sep 20, 2020

yeban commented Sep 20, 2020

pan-genome commented Sep 20, 2020

yeban commented Sep 26, 2020

pan-genome commented Sep 29, 2020 •

edited

Loading

pan-genome commented Sep 30, 2020

akshaya-v commented Mar 16, 2022

flo failed on Large genome #33

flo failed on Large genome #33

Comments

pan-genome commented Sep 18, 2020

yeban commented Sep 20, 2020

pan-genome commented Sep 20, 2020

pan-genome commented Sep 20, 2020

yeban commented Sep 20, 2020

pan-genome commented Sep 20, 2020

yeban commented Sep 26, 2020

pan-genome commented Sep 29, 2020 • edited Loading

pan-genome commented Sep 30, 2020

akshaya-v commented Mar 16, 2022

pan-genome commented Sep 29, 2020 •

edited

Loading