Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create Databases #3

Open
eggrandio opened this issue Jan 21, 2025 · 5 comments
Open

Cannot create Databases #3

eggrandio opened this issue Jan 21, 2025 · 5 comments

Comments

@eggrandio
Copy link

Hello,

I am trying to use the Transannot pipeline but I am having trouble downloading and formatting the databases. Maybe the error is related to MMseqs2 msa2profile.

I have tried to use it in my computer and in a cluster (in case it was a memory or disk space issue), and I am getting the same kind of error. For example, with Pfam and eggNOG:

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
25e4c1|OK  |   351KiB/s|transannotdb/tmp//9657914214923664368/viruses.tar.aria2

Status Legend:
(OK):download completed.
tar2db transannotdb/tmp//9657914214923664368/bacteria.tar transannotdb/tmp//9657914214923664368/archea.tar transannotdb/tmp//9657914214923664368/eukaryota.tar transannotdb/tmp//9657914214923664368/viruses.tar transannotdb/tmp//9657914214923664368/msa --output-dbtype 11 --tar-include \.raw_alg\.faa\.gz$ --threads 112 -v 3 

[==================================Time for merging to msa: 0h 0m 0s 101ms
Time for merging to msa.lookup: 0h 0m 0s 56ms
Time for processing: 0h 1m 32s 398ms
/home2b/eduardo.gonzalez/transannotdb/egggNOGDB exists and will be overwritten
msa2profile transannotdb/tmp//9657914214923664368/msa /home2b/eduardo.gonzalez/transannotdb/egggNOGDB --match-mode 1 --match-ratio 0.5 --threads 112 -v 3 

Finding maximum sequence length and set size.
[=================================================================] 349.75K 46m 2s 350ms
Time for merging to egggNOGDB_h: 0h 0m 0s 152ms
Error: msa2profile died
Error: download database died
Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
ac54f7|OK  |    38MiB/s|transannotdb/tmp//9475689816711731008/db.msa.gz.aria2

Status Legend:
(OK):download completed.
convertmsa transannotdb/tmp//9475689816711731008/db.msa.gz transannotdb/tmp//9475689816711731008/msa -v 3 

[==Time for merging to msa: 0h 0m 0s 3ms
Time for processing: 0h 16m 51s 187ms
/home2b/eduardo.gonzalez/transannotdb/PfamDB exists and will be overwritten
msa2profile transannotdb/tmp//9475689816711731008/msa /home2b/eduardo.gonzalez/transannotdb/PfamDB --match-mode 1 --match-ratio 0.5 --threads 112 -v 3 

Finding maximum sequence length and set size.
[=================================================================] 23.66K 1h 2m 28s 293ms
Time for merging to PfamDB_h: 0h 0m 0s 35ms
Error: msa2profile died
Error: download database died

But I am also getting errors with SwissProt:

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
717d6d|OK  |    18MiB/s|transannotdb/tmp//1634759004325013335/uniprot_sprot.fasta.gz.aria2

Status Legend:
(OK):download completed.
/home2b/eduardo.gonzalez/transannotdb/swissprotDB exists and will be overwritten
createdb transannotdb/tmp//1634759004325013335/uniprot_sprot.fasta.gz /home2b/eduardo.gonzalez/transannotdb/swissprotDB --compressed 0 -v 3 

Shuffle database cannot be combined with --createdb-mode 0
We recompute with --shuffle 0
Converting sequences
Only uncompressed fasta files can be used with --createdb-mode 0
We recompute with --createdb-mode 1
Time for merging to swissprotDB_h: 0h 0m 0s 0ms
Error: createdb died
Error: download database died
@vragh
Copy link
Collaborator

vragh commented Jan 22, 2025

Thank you so much for trying out TransAnnot. I'm sorry that you're experiencing problems here.

Two questions:

  1. Which version of TransAnnot is this?
  2. Can you please share the exact command invocation that leads to this error?

@eggrandio
Copy link
Author

eggrandio commented Jan 22, 2025

Hello,

I am using the latest bioconda version (TransAnnot Version: 3.70b2a60).

I have managed to install Pfam and Swiss-prot just by trying several times. I am still having the same issue with eggNOG.

The script is this:

conda activate transannot
mkdir -p transannotdb/tmp

transannot downloaddb eggNOG transannotdb/egggNOGDB transannotdb/tmp/
transannot downloaddb Pfam-A.full transannotdb/PfamDB transannotdb/tmp/
transannot downloaddb UniProtKB/Swiss-Prot transannotdb/swissprotDB transannotdb/tmp/

Here is the full log file from when I had issues with the three of them (the error messages are not too informative).

transannot_db_1495_out.txt

Let me know if any additional information would help.

Best,

@vragh
Copy link
Collaborator

vragh commented Jan 22, 2025

Thank you for the details.

3.70b2a60 is an older version, not sure why that's the one that seems to be getting installed by default via conda/bioconda. I just tried installing the newest version available there (mamba create -n test_conda -c bioconda transannot=3.0.0) and this downloads SwissProt successfully (transannot downloaddb UniProtKB/Swiss-Prot SwissProtDB SwissProtDB_tmp), and in theory, then, should also successfully download the other two databases. Could you try this please?

I notice that you are reusing the temporary directory for all three downloads. Can you try supplying fresh, separate temporary directories for each database please (e.g., tmp_eggnogdb, tmp_swissprotdb, tmp_pfamdb)?

I also tested the latest version that can be compiled from the source code on GitHub and this one is also able to download all three (default) databases successfully. So if the conda version fails again, you might want to (unfortunately) consider compiling TransAnnot locally.

And if nothing else works, we can look into sharing the pre-computed DBs from our end with you somehow.

@eggrandio
Copy link
Author

Hello,

I managed to create the three databases with version 3.70b2a60 without doing any changes to my script (using the same tmp directory). I used the computing cluster.

I don't know what might have caused the issue. Is there any way of getting more detailed error messages? (I assume the error comes from mmseqs msa2profile so maybe I will ask in their repo).

Best,

@mariia-zelenskaia
Copy link
Member

Hi! Thank you for your sharp eye and for pointing out this big issue! We've updated the bioconda version. Could you try re-installing transannot and trying again? Should it not work, do not hesitate to contact us again!

Best,
Mariia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants