Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--split-memory-limit not reducing memory requirement of mmseqs search job #338

Open
nick-youngblut opened this issue Jul 28, 2020 · 23 comments

Comments

@nick-youngblut
Copy link

Expected Behavior

According to the mmseqs docs:

The --split-memory-limit parameter can give MMseqs2 an upper limit of system RAM to use for the large prefiltering data structures. MMseqs2 will still use some additional memory for its database structures etc. In total, --split-memory-limit will be about 80% of the total memory required. Order of magnitude suffices can be passed to --split-memory-limit, such as 10G for ten gigabyte or 1T for one terabyte of RAM.

...so I'm using --split-memory-limit with 80% of the RAM provided for the qsub job. However, I always get the error:

search --threads 8 -e 1e-3 --max-accept 1 --max-seqs 100 -s 6 --num-iterations 2 --split 0 --split-memory-limit 160G /ebio/abt3_scratch/nyoungblut/Struo2_83927407775/mmseqs_search/seqs09_db /ebio/abt3_scratch/nyoungblut/Struo2_83927407775/mmseqs_search_db/db /ebio/abt3_scratch/nyoungblut/Struo2_83927407775/mmseqs_search/hits_seqs09_db /ebio/abt3_scratch/nyoungblut/Struo2_83927407775/mmseqs_search_TMP09

MMseqs Version:                     	11.e1a1c
E-value threshold                   	0.001
Max accept                          	1
Threads                             	8
Sensitivity                         	6
Max results per query               	100
Split database                      	0
Split memory limit                  	160G
Search iterations                   	2

Failed to mmap memory dataSize=321859477504 File=/ebio/abt3_scratch/nyoungblut/Struo2_83927407775/mmseqs_search_db/db.idx. Error 12.

Even if I reduce --split-memory-limit to 50% or just 20% of the total memory provided for the qsub job, the job still dies with the same error. Maybe I'm not understanding or using --split-memory-limit correctly??

I'm using UniRef90 as the db. If I use 336G for the job mem limit, then the mmseq search job runs without an error.

Steps to Reproduce (for bugs)

mmseqs search on UniRef90 to provide a large RAM requirement for the job.

Your Environment

Ubuntu 18.04.4

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py38h32f6830_0    conda-forge
fasta-splitter            0.2.6                         0    bioconda
gawk                      5.1.0                h516909a_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
ld_impl_linux-64          2.34                 h53a641e_7    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
libidn2                   2.3.0                h516909a_0    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_2    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libunistring              0.9.10               h14c3975_0    conda-forge
llvm-openmp               8.0.1                hc9558a2_0    conda-forge
mmseqs2                   11.e1a1c             h2d02072_0    bioconda
ncurses                   6.2                  he1b5a44_1    conda-forge
numpy                     1.19.0           py38h8854b6b_0    conda-forge
openmp                    8.0.1                         0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
perl-constant             1.33                    pl526_1    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-file-util            4.161950                pl526_3    bioconda
perl-lib                  0.63                    pl526_1    bioconda
pigz                      2.3.4                hed695b0_1    conda-forge
pip                       20.1.1                     py_1    conda-forge
prodigal                  2.6.3                h516909a_2    bioconda
python                    3.8.4           cpython_h425cb1d_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
setuptools                49.2.0           py38h32f6830_0    conda-forge
sqlite                    3.32.3               hcee41ef_1    conda-forge
tk                        8.6.10               hed695b0_0    conda-forge
vsearch                   2.15.0               h2d02072_0    bioconda
wget                      1.20.1               h22169c7_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
@milot-mirdita
Copy link
Member

milot-mirdita commented Jul 28, 2020

So what's going on is that you created an index with createindex that was not aware of how large it may become. So it was created to use as much memory as possible. You would have to call createindex also with --split-memory-limit.

This split indices work but have unexpected performance pitfalls: You need to have the index on a fast IO system so you can import each split into memory fast enough. Since the index is larger than the input sequences it can be faster to just recompute the index on the fly instead of reading in an existing one.

@nick-youngblut
Copy link
Author

Thanks for the clarification! So the use of --split-memory-limit for mmseqs search is only if the index is computed on the fly, correct? That makes sense. I guess that I was thrown off by the Split memory limit 160G listed in the info printed to the screen. Maybe a warning stating: index file found. Not using --split-memory-limit would be helpful?

@milot-mirdita
Copy link
Member

Currently, I think it should crash no matter what since there is an index present that doesn't fit into RAM. The error message for that is very useless.

You have to either recreate the index with a certain memory limit in mind or remove it (actually rename just the .idx.dbtype to something else and it won't be able to find and use it anymore, then you can name it back again later if you still need it).

@nick-youngblut
Copy link
Author

nick-youngblut commented Dec 7, 2020

I created an index for UniRef90 using --split 4, which produced 4 splits and required ~70G of memory to generate. Now I'm trying to run mmseqs search on that target database with the associated idx that I just created. My mmseq search cluster jobs all die (even when providing up to 200G of memory) with the following error:

Failed to mmap memory dataSize=58928025600 File=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/humann3_search/mmseqs_search_db/db.idx. Error 12

According to the error message, the database size is < 100G, so why am I getting an mmap error?

I also get this error when using --db-load-mode 1, which should use fread instead of mmap, according to the help docs, so I don't understand why I'm still getting the Failed to mmap memory in that situation.

I'm using mmseq2 12.113e3

The mmseqs search jobs complete successfully when running them locally (a server with plenty of memory). It appears to be something specific to the SGE cluster. Both local and cluster nodes are running Ubuntu 18.04.5

@nick-youngblut
Copy link
Author

It turns out that I just need ~300G of memory for the job in order to not get the Failed to mmap memory error, even though the mmseq search job states: Estimated memory consumption: 122G

@nick-youngblut
Copy link
Author

nick-youngblut commented Dec 8, 2020

It appears that no matter how many splits I use for mmseqs createindex, the amount of memory for running mmseqs search on the UniRef90 database + split idx is always ~300G. Shouldn't splitting the database index into portions reduce the memory required for mmap'ing?

Using mmseqs search --split N doesn't help, but I didn't expect it to, given that the splitting was done prior during the mmseqs createindex step.

I'm using 8 threads. Is the Estimated memory consumption: listed during mmseq search the amount of memory required per thread?

@milot-mirdita
Copy link
Member

Could you run ls -la on all the database files? The first split .0 should be much larger than the others.
What does sysctl vm.overcommit_memory say on the system?

@nick-youngblut
Copy link
Author

Yeah, the first split is ~2x larger than the rest. sysctl is not present on the cluster nodes.

I tried using 16 splits & 8 mmseq-search threads, and that cut down on the memory requirement from 296G to just 232G

@nick-youngblut
Copy link
Author

When I split the database into 16 parts, the first split is 70G, while all of the rest are 16G. Is there a way to get equal-sized splits (assuming that would require less memory for running mmseqs search)?

@milot-mirdita
Copy link
Member

The first split also stores other data not needed for the prefiltering. I still suspect that there is something wrong with memory overcommit on your system.

Try cat /proc/sys/vm/overcommit_memory instead of the sysctl command. It should be 0 or 1. If it's 2 it will run into many weird problems.

@nick-youngblut
Copy link
Author

/proc/sys/vm/overcommit_memory is 0, so I guess that's not the issue.

Our SGE cluster is running Ubuntu 18.04.5 with NFSv4.

@milot-mirdita
Copy link
Member

the mmap documentation says that mmap takes rlimit into account:

 ENOMEM (since Linux 4.7) The process's RLIMIT_DATA limit, described
         in getrlimit(2), would have been exceeded.

SGE seems to have a couple knobs to limit that (i.e. s_vmem and h_vmem). MMseqs2 can use way more virtual memory than actual memory. Maybe that's the conflict?

@nick-youngblut
Copy link
Author

I normally just request h_vmem, such as:

#!/bin/bash
#$ -pe parallel 8
#$ -l h_vmem=38G

For ~300G of vmem. I haven't played around with s_vmem.

@milot-mirdita
Copy link
Member

Does the issue also happen if you don't set that? What linux kernel version are your nodes running?
I am super surprised that linux seems to be enforcing this limit. We should be able to allocate a lot more virtual memory than physically available one.

@nick-youngblut
Copy link
Author

Does the issue also happen if you don't set that?

If I don't set h_vmem then the default is used (2G), and I definitely get the mmap error.

The nodes are running Ubuntu 18.04.5.

@milot-mirdita
Copy link
Member

milot-mirdita commented Dec 9, 2020

18.04.5 seems to use kernel 5.4 so later than 4.7. I think we found the reason. You should set h_vmem to at least the size of the largest split file, if not larger.

I am still not sure that it's will be very valuable to precompute an Index in your case. Transferring the large index with NFS might be slower than recomputing it on the fly.

@nick-youngblut
Copy link
Author

Computing the idx with 8 threads takes ~1 hour. Transferring the large index is much faster. My previous jobs that created the idx on the fly took ~2 hours, but with the pre-computed idx, the jobs take ~30 minutes.

Is there any why to homogenize the splits so that they are all approx. the same size.

To be clear, ~29G of h_vmem per thread (using 8 threads) is needed to run the mmseqs search jobs, but the largest split idx file is 70G.

@nick-youngblut
Copy link
Author

Maybe I should note that I'm splitting the query fasta into subsets, creating mmseqs dbs for each, and searching against UniRef90 (with the pre-generated idx). I know that I could use openmpi for scaling on a cluster, but splitting and running all of the queries in parallel with snakemake is more fault-tolerant.

Having to request ~300G per cluster job greatly limits the number of parallel jobs that will run on the cluster at the same time, so I'd prefer to reduce the memory requirements, if possible.

It seems that the first split stays fairly large in size regardless of the total number of splits. I'd try ~30 splits, but I'm guessing that I will still be stuck with a split file that's ~70G.

@milot-mirdita
Copy link
Member

milot-mirdita commented Dec 9, 2020

I split the non-prefilter-index parts into separate files with this commit: 553a670
You can either compile from source or find statically compiled binaries in about an hour at https://mmseqs.com/latest (if i didn't break anything and the regression suite runs successfully :D)

@nick-youngblut
Copy link
Author

Great! Thanks for the quick edit! I'll give it a try later today.

@nick-youngblut
Copy link
Author

nick-youngblut commented Dec 10, 2020

The good news is that the updated code splits the idx rather evenly:

-rw-r--r-- 1 nyoungblut abt3 371M Dec  9 22:32 db.idx.0
-rw-r--r-- 1 nyoungblut abt3  37G Dec  9 22:32 db.idx.1
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.2
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.3
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.4
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.5
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.6
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.7
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.8
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.9
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.10
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.11
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.12
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.13
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.14
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.15
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.16
-rw-r--r-- 1 nyoungblut abt3  12G Dec  9 22:32 db.idx.17
-rw-r--r-- 1 nyoungblut abt3 3.0K Dec  9 22:32 db.idx.index
-rw-r--r-- 1 nyoungblut abt3    4 Dec  9 22:32 db.idx.dbtype

The bad news is that mmseqs search still generates mmap errors (Failed to mmap memory dataSize=12723929088) unless I provide ~240G of memory (parallel=8, h_vmem=29G). So, it appears that splitting the idx more evenly didn't help with the memory requirements for mmseqs search. Maybe the issue is that the idx.1 split is still rather large?

Note: I still require ~240G of memory if I just use 1 thread (parallel=1, h_vmem=240G)

@milot-mirdita
Copy link
Member

I am not sure what the fix is. I think the issue is now that this RLIMIT_DATA limit not actually per allocation but per process. So the additional splits weren't really useful.

IMO the h_vmem is a weird feature of SGE and MMseqs2 allocating a lot of virtual memory is reasonable. Its real memory consumption is much smaller, however I am not sure that SGE can track that.

I would suggest to talk to your SGE admin to setup a separate queue that doesn't enforce memory limits.

Reengineering MMseqs2 to page in splits on-demand is I think quite a big reengineering effort. We can keep it in mind for the future.

@nick-youngblut
Copy link
Author

Thanks for your help with this! Yeah, no matter how many splits I create, .idx.1 is always 37G, while the other files are much smaller, and regardless, the required h_vmem is ~240G.

Given the speed/accuracy of mmseqs, I'll probably still use it for my needs, if at all possible. Otherwise, I'll have to switch back to diamond in order to reduce the memory required for each job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants