Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendations for running aviary assemble on Large Datasets to avoid out-of-memory #251

Closed
Anna-MarieSeelen opened this issue Feb 17, 2025 · 2 comments

Comments

@Anna-MarieSeelen
Copy link

Anna-MarieSeelen commented Feb 17, 2025

Hi,

I'm using Aviary 0.11.0 and really appreciate the latest improvements—thanks for your work!

I’m trying to process large sequencing datasets (100GB files each for forward and reverse reads) from water treatment samples. However, during error correction with BayesHammer in SPAdes, I run into an out-of-memory issue (similar to this SPAdes issue:(ablab/spades#1383).

I know this is a common problem with BayesHammer, but I was wondering if you have experience running such large datasets with Aviary. Specifically:

  • How much memory and how many threads would you recommend?
  • Did you make any system configuration changes to handle large datasets?

The error at the end of the short_read_assembly log is:

7:11:30.998   163G / 211G  INFO    General                 (kmer_index_builder.hpp    : 264)   Starting k-mer counting.
  7:11:31.066   163G / 211G  ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12
=== Stack Trace ===
  7:11:31.066   163G / 211G  ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12  7:11:31.066   163G / 211G  ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12
  7:11:31.066   163G / 211G  ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12=== Stack Trace ===
=== Stack Trace ===
=== Stack Trace ===
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x1d587) [0x55b4feedc587]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x34a01) [0x55b4feef3a01]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x496c4) [0x55b4fef086c4]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x63e84) [0x55b4fef22e84]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x65b21) [0x55b4fef24b21]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/../lib/libgomp.so.1(+0x18f09) [0x14d8d5d40f09]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x14d8d5694b43]
/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x14d8d5726a00]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x1d587) [0x55b4feedc587]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x34a01) [0x55b4feef3a01]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x496c4) [0x55b4fef086c4]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x63e84) [0x55b4fef22e84]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x65b21) [0x55b4fef24b21]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/../lib/libgomp.so.1(+0x18f09) [0x14d8d5d40f09]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x14d8d5694b43]
/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x14d8d5726a00]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x1d587) [0x55b4feedc587]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x34a01) [0x55b4feef3a01]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x496c4) [0x55b4fef086c4]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x63e84) [0x55b4fef22e84]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer(+0x65b21) [0x55b4fef24b21]
/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/../lib/libgomp.so.1(+0x18f09) [0x14d8d5d40f09]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x14d8d5694b43]
/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x14d8d5726a00]


== Error ==  system call for: "['/vol/micro-bioinfo/Anaconda3-24/envs/6230d08d3ac562697565131bbe60723b_/bin/spades-hammer', '/scratch/Aviary_0.11.0-nOS/output_Aviary_0.11.0_5669466/data/short_read_assembly/corrected/configs/config.info']" finished abnormally, OS return value: 12
None

In case you have troubles running SPAdes, you can report an issue on our GitHub repository github.com/ablab/spades
Please provide us with params.txt and spades.log files from the output directory.

SPAdes log can be found here: /scratch/Aviary_0.11.0-nOS/output_Aviary_0.11.0_5669466/data/short_read_assembly/spades.log

Thank you for using metaSPAdes! If you use it in your research, please cite:

  Nurk, S., Meleshko, D., Korobeynikov, A. and Pevzner, P.A., 2017. metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), pp.824-834.
  doi.org/10.1101/gr.213959.116

Queueing command spades.py --memory 800 --meta -t 50 -o data/short_read_assembly --12 data/short_reads.fastq.gz -k auto --tmp-dir /scratch/Aviary_0.11.0-nOS

General error file is:

Aviary_0.11.0_5669466_stderr.txt

The full short_read_assembly.log file is as follows:

short_read_assembly.log

Best regards,

Anna

@wwood
Copy link
Collaborator

wwood commented Feb 19, 2025

Hi @Anna-MarieSeelen thanks for the kind words.

The strange thing about your log is that you are asking for 800G of memory, but BayesHammer is failing at ~211G. This suggests there is some kind of system error - maybe it would be worth asking your system administrator if there is some issue with the server that is restricting the amount of RAM available?

One way of dealing with this is to use megahit, which is a lot more memory efficient. 100G is a lot of data for SPAdes.

ben

@Anna-MarieSeelen
Copy link
Author

Hi @wwood thank you for the quick response. I will look into it!

Best,

Anna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants