-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmseqs databases GTDB setup fails #561
Comments
Same here, has tracked it down to this command: mmseqs tar2db /localscratch/users/latest/gtdb.tar.gz /localscratch/users/latest/tardb --tar-include faa$ The problem comes with the regular expresion used in the option --tar-include, I cannot find why, but if you set it with the $ at the end, it doesn't works. If you remove it it works but obviously it is going to accept more files than desired. I have tried with different regular expressions, scaping the $, quoting it, etc... no one works, it seems like the tar2db is silently failing when you use there any regular expression, but works when you use a simple string. |
Listing the gtdb.tar.gz It seems that removing the $ will not affect the number of files used in the tardb because all files are .faa, so I have edited the download.sh script and I am building the database in this way while a fix is released:
Also it seems to fail in servers with 512 GB of RAM due a some mapping of the files into memory, so I have used one with 2TB, it seems that the program is using Using MMseqs2 Version: 92deb92 |
What linux distribution and version are you using? I am trying to get |
I think I found the issue, its mostly unrelated to the regex itself. When an entry is skipped, we don't correctly update the data offset for the next tar entry. |
Ok, anyway we use Suse LEAP 15.2. |
We were not correctly updating the position in the tar file, when files were skipped
I have the same problem. Expected BehaviorI want to download and use the GTDB database with
Current BehaviorThe process is killed and the output remains empty.
MMseqs Output (for bugs)''' MMseqs Version: 13.45111 % Total % Received % Xferd Average Speed Time Time Time Current Time for merging to tardb: 0h 0m 0s 1ms Converting sequences Time for merging to gtdb_h: 0h 0m 0s 2ms Your Environment
|
Please download the latest precompiled static binaries from https://mmseqs.com/latest GTDB download should work with these |
It works like a charm with the latest precompiled static binaries, thanks. |
This should be now available in our newest release: https://github.com/soedinglab/MMseqs2/releases/tag/14-7e284 |
It can't work with mmseqs2 390457d |
Hi, it still can't work with mmseqs2 version 14.7e284. Additionally, the VERSION file cannot be downloaded successfully. It can be downloaded manually, suggesting the internet should be fine. |
Any suggestions? Thanks. |
Hi, |
This should work again in MMseqs2 release 15. Please open a new issue if its failing. |
Expected Behavior
Completes databases workflow, creating GTDB database.
Current Behavior
Error occurs near the end of the workflow.
Files created:
gtdb gtdb.dbtype gtdb_h gtdb_h.dbtype gtdb_h.index gtdb.index gtdb.source tmp
Steps to Reproduce (for bugs)
mmseqs databases GTDB gtdb tmp
MMseqs Output (for bugs)
Context
Downloading GTDB db.
Your Environment
Using conda installation of mmseqs (MMseqs2 Version: 13.45111)
128cpu/1000GB mem. Support for AVX2
The text was updated successfully, but these errors were encountered: