-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in topiary-seed-to-alignment #33
Comments
Thanks for the bug report! I've never seen this one before. It looks to me like it is choking when downloading and reading the checksum file to validate the downloaded proteome. Is there a file called Thanks for your help; hopefully we can resolve this quickly. |
It does, but it is too long to be copied here. Let me know if the entire file is needed and I'll post it somewhere, here's its top and last lines: d0e8e6b5c981ff948c657166270a7c88 ./Annotation_comparison/GCF_000002035.6_GRCz11_compare_prev.gbp.gz (...) 020d65335491f47fa29beadf092e8695 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395039.1_NC_007129.7.asn |
I think we're getting somewhere. topiary assumes an md5 file has rows that have the format "hash file". It looks like this file is truncated (the last line looks like an incomplete hash). I suspect the md5 download terminated early for some reason. If this is true, you should be able to re-run and successfully complete the job. I can patch topiary to prevent this in the future by adding a check to make sure the md5 file downloads successfully, rather than cryptically crashing. Maybe try re-running the job? Thanks! |
A rerun produced a very similar output - I got a 01_initial-dataframe.csv with the same filesize, the blast result XML is almost the same size (a difference of three lines), and md5checksums.txt is again truncated: 3ce0f863975dd40c4ea48c96478d30ed ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394748.1_NC_007120.7.gff |
That's strange. I just created a bug fix that downloads the md5sum file, checks if it is sane, then attempts to download it again if it fails. Would you be up for seeing if it fixes your problem? To download the change, you can follow the instructions below:
Best, Mike |
Hi, is the information correct? When I tried
git pull ***@***.***:harmsm/topiary.git main
I get the message:
***@***.***: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Em sex., 24 de fev. de 2023 às 20:43, Mike Harms ***@***.***>
escreveu:
… That's strange. I just created a bug fix that downloads the md5sum file,
checks if it is sane, then attempts to download it again if it fails. Would
you be up for seeing if it fixes your problem? To download the change, you
can follow the instructions below:
conda activate topiary
cd the_topiary_directory_wherever_you_downloaded_it
git checkout -b harmsm-main main
git pull ***@***.***:harmsm/topiary.git main
python setup.py install
Best,
Mike
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRZJB3Y257SOJN3ECJTLH3WZFBRHANCNFSM6AAAAAAVFS7K6E>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I am having the same error, actually. Below is my output. Polishing alignment and re-aligning.muscle 5.1.linux64 [] 396Gb RAM, 40 cores Input: 2 seqs, length avg 392 max 408 00:00 17Mb 50.0% Derep 0 uniques, 0 dupes Success. Alignment written to the The above exception was the direct cause of the following exception: Traceback (most recent call last): Caught exception in function 'seed_to_alignment'. Returning to starting The above exception was the direct cause of the following exception: Traceback (most recent call last): Function seed_to_alignment raised an error. To see command line help, run topiary-seed-to-alignment --help and the last few lines of my md5checksums.txt are: 75f783e620888f6a20c9e7030bf54de2 ./Gnomon_models/GCF_000001405.40_GRCh38.p14_gnomon_model.gff.gz |
Thanks for the report. I just merged the PR I referenced above. I still have not been able to reproduce the error on my end. Can one of you try the command again with the new version? To install the latest version, you could run the following:
Thanks! (And thanks for your patience with the delayed response to this thread). |
I followed the instructions above, and still ran into the same error. The terminal output is attached: |
@jjvanantwerp Thanks for the bug report and sorry for the slow reply. Dangerous having the prof in charge of package maintenance... I looked through your log file; it appears you're having a different bug. It's crashing when polishing the final alignment. If possible could you please post the last csv file that topiary writes out before the crash occurs? Based on when the crash occurs, I believe this should be Thanks. |
Yes, here it is. |
Hey, Mike, sorry about the delay, I just had one of those crazy weeks.
Here's the error I'm getting:
Downloading Danio rerio proteome
Downloading proteome for taxid '7955'
Process Process-11:
Traceback (most recent call last):
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py",
line 314, in _bootstrap
self.run()
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py",
line 108, in run
self._target(*self._args, **self._kwargs)
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/ftp.py",
line 36, in _ftp_thread
ftp.retrbinary(cmd="RETR " + file_name,
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line
445, in retrbinary
return self.voidresp()
^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line
259, in voidresp
resp = self.getresp()
^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line
244, in getresp
resp = self.getmultiline()
^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line
230, in getmultiline
line = self.getline()
^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line
218, in getline
raise EOFError
EOFError
Traceback (most recent call last):
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py",
line 92, in ncbi_ftp_download
md5_dict[file_name]
~~~~~~~~^^^^^^^^^^^
KeyError: 'GCF_000002035.6_GRCz11_protein.faa.gz'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py",
line 217, in get_proteome
ncbi_ftp_download(genome_url,file_base="_protein.faa.gz")
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py",
line 96, in ncbi_ftp_download
raise FileNotFoundError(err)
FileNotFoundError: The file 'GCF_000002035.6_GRCz11_protein.faa.gz' is not
present on the NCBI.
Full path:
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11//genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/GCF_000002035.6_GRCz11_protein.faa.gz
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py",
line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py",
line 406, in seed_to_alignment
proteome_list.append(topiary.ncbi.get_proteome(taxid=this_taxid))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py",
line 241, in get_proteome
raise RuntimeError(err)
RuntimeError:
Could not download proteome GCF_000002035.6_GRCz11_protein.faa.gz.
This can happen if an assembly is in the NCBI database but
does not have an associated _protein.tar.gz file. If you
are running this as part the seed_to_alignment pipeline,
you have a couple of options. 1) You can replace the problematic
species (taxid = 7955) in your seed dataset and start
the pipeline again. 2) You can edit the 01_initial-dataframe.csv
file, adding or editing the column 'recip_blast'. Set this to
'FALSE' for every row *except* the rows with key_species = 'TRUE'.
Set this to 'FALSE' for the problematic species. You
can then restart the pipeline with the --restart flag. Topiary
will not use this species for reciprocal BLAST, but will still
treat it as a key species in other respects.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py",
line 185, in wrap_function
ret = fcn(**fcn_args.__dict__)
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py",
line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:
Caught exception in function 'seed_to_alignment'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment",
line 26, in <module>
main()
File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment",
line 21, in main
wrap_function(seed_to_alignment,
File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py",
line 189, in wrap_function
raise RuntimeError(err) from e
RuntimeError:
Function seed_to_alignment raised an error.
To see command line help, run topiary-seed-to-alignment --help
Em qui., 2 de mar. de 2023 às 16:47, Mike Harms ***@***.***>
escreveu:
… Thanks for the report. I just merged the PR I referenced above. I still
have not been able to reproduce the error on my end. Can one of you try the
command again with the new version? To install the latest version, you
could run the following:
cd topiary
git pull origin main
conda activate topiary
python -m pip install . -vv
Thanks! (And thanks for your patience with the delayed response to this
thread).
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRZJBZEVUSZHE4GRQLMT2LW2D2LRANCNFSM6AAAAAAVFS7K6E>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@jjvanantwerp : Thanks for the file! I am able to reproduce the error and am working on this now. @lbleicher : thanks for the detailed error message. I'll look into. |
@jjvanantwerp Should be fixed now. I just merged a PR with the change. You should be able to run the following to install the latest and greatest version. Thanks for helping troubleshoot!
|
Yes, I was able to progress past the alignment! I think this issue can be closed. Unfortunately, I will need to open another for what appears to be the same error in the next step. I am not sure if here is the best place to discuss that or if I should open a new issue - it's that same place in the wrap function, line 189. |
Glad we made progress! The wrap function will always throw an error; it’s a way to capture internal errors and make sure the crashing function returns to the right directory, clean up, etc. Maybe paste the whole error?
Thanks!
Mike
… On Mar 14, 2023, at 9:53 PM, James ***@***.***> wrote:
Yes, I was able to progress past the alignment! I think this issue can be closed. Unfortunately, I will need to open another for what appears to be the same error in the next step. I am not sure if here is the best place to discuss that or if I should open a new issue - it's that same place in the wrap function, line 189.
—
Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFZA6R2N3IBQXXOQK4A7W3W4FDLHANCNFSM6AAAAAAVFS7K6E>.
You are receiving this because you commented.
|
Terminal Saved Output Mar 15.txt I have attached the whole terminal session, but below is the relevant part. It says the issue is that my alignment is too small, and I'm not sure if there's a way to address this here or upstream. (topiary_resolved) [vanant25@dev-intel16 topiary]$ topiary-alignment-to-ancestors ER_Final_Alignment.csv --out_dir ER_ASR --num_threads 1 Non-microbial dataset detected. Gene/species tree reconciliation will be performedChecking raxml-ng
Checking generax
Checking mpirun
topiary is starting a find_best_model calculation in ./00_find-model: Generating maximum parsimony tree. Launching raxml-ng, 0:00:00.007415 (H:M:S) topiary ran a find_best_model calculation in ./00_find-model:
Traceback (most recent call last): /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng outputRAxML-NG v. 1.1 released on 29.11.2021 by The Exelixis Lab. System: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 125 GB RAM RAxML-NG was called at 15-Mar-2023 00:48:29 as follows: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng --start --msa alignment.phy --model LG --seed 3997117630 --threads 1 --tree pars{1} Analysis options: [00:00:00] Reading alignment from file: alignment.phy ERROR: Your alignment contains less than 4 sequences! ERROR: Alignment check failed (see details above)! The above exception was the direct cause of the following exception: Traceback (most recent call last): Caught exception in function 'launch'. Returning to starting The above exception was the direct cause of the following exception: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): Caught exception in function 'find_best_model'. Returning to starting The above exception was the direct cause of the following exception: Traceback (most recent call last): Caught exception in function 'alignment_to_ancestors'. Returning to starting The above exception was the direct cause of the following exception: Traceback (most recent call last): Function alignment_to_ancestors raised an error. To see command line help, run topiary-alignment-to-ancestors --help (topiary_resolved) [vanant25@dev-intel16 topiary]$ |
Hi James,
Yep, alignment is too small. This mini-dataset only has a few, nearly identical, sequences that are trimmed out during the quality control step. Maybe try going back upstream, doing seed_to_alignment before feeding into ali_to_anc? That should BLAST and pull many more sequences down for your tree inference.
Best,
Mike
… On Mar 14, 2023, at 10:00 PM, James ***@***.***> wrote:
Terminal Saved Output Mar 15.txt <https://github.com/harmslab/topiary/files/10976158/Terminal.Saved.Output.Mar.15.txt>
I have attached the whole terminal session, but below is the relevant part. It says the issue is that my alignment is too small, and I'm not sure if there's a way to address this here or upstream.
(topiary_resolved) ***@***.*** topiary]$ topiary-alignment-to-ancestors ER_Final_Alignment.csv --out_dir ER_ASR --num_threads 1
Non-microbial dataset detected. Gene/species tree reconciliation will be performed
Checking raxml-ng
installed: Y
binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng
binary runs: Y
version: 1.1
minimum version: 1.1
passes: Y
Checking generax
installed: Y
binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/generax
binary runs: Y
version: 2.0.4
minimum version: 2.0
passes: Y
Checking mpirun
installed: Y
binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/mpirun
binary runs: Y
version: 4.1.5
minimum version: 0.0
passes: Y
topiary is starting a find_best_model calculation in ./00_find-model:
Generating maximum parsimony tree.
Launching raxml-ng, 0:00:00.007415 (H:M:S)
topiary ran a find_best_model calculation in ./00_find-model:
Crashed after 0:00:00.021205 (H:M:S)
Please check ./00_find-model/working
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 336, in launch
raise RuntimeError(err)
RuntimeError: ERROR: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng returned 1
/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng output
RAxML-NG v. 1.1 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
System: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 125 GB RAM
RAxML-NG was called at 15-Mar-2023 00:48:29 as follows:
/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng --start --msa alignment.phy --model LG --seed 3997117630 --threads 1 --tree pars{1}
Analysis options:
run mode: Starting tree generation
start tree(s): parsimony (1)
random seed: 3997117630
SIMD kernels: AVX2
parallelization: coarse-grained (auto), NONE/sequential
[00:00:00] Reading alignment from file: alignment.phy
[00:00:00] Loaded alignment with 2 taxa and 410 sites
ERROR: Your alignment contains less than 4 sequences!
ERROR: Alignment check failed (see details above)!
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 189, in run_raxml
interface.launch(cmd,
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:
Caught exception in function 'launch'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 260, in find_best_model
_generate_parsimony_tree(supervisor.alignment,
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 45, in _generate_parsimony_tree
run_raxml(run_directory=run_directory,
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 197, in run_raxml
raise RuntimeError from e
RuntimeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/pipeline/alignment_to_ancestors.py", line 323, in alignment_to_ancestors
topiary.find_best_model(df,
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:
Caught exception in function 'find_best_model'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function
ret = fcn(**fcn_args.dict)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:
Caught exception in function 'alignment_to_ancestors'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 26, in
main()
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 21, in main
wrap_function(alignment_to_ancestors,
File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function
raise RuntimeError(err) from e
RuntimeError:
Function alignment_to_ancestors raised an error.
To see command line help, run topiary-alignment-to-ancestors --help
(topiary_resolved) ***@***.*** topiary]$
—
Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFZA6TYIZHZMOGZ7IIEVG3W4FEFDANCNFSM6AAAAAAVFS7K6E>.
You are receiving this because you commented.
|
The input file was the output of the seed_to_alignment, i thought. I used the 05_clean-aligned-dataframe.csv as the input for ali_to_anc, without any cleaning. |
Ah, I think I might understand. Did you only include a one human sequence in there as a seed? If so, topiary is only looking for human/primate sequences because the seed dataset specifies the taxonomic scope as only human. You’ll want to add a sequence from another species that indicates the taxonomic scope to reconstruct (e.g., human-bony fishes, all mammals, etc.). We describe how to think about this here:
https://topiary-asr.readthedocs.io/en/latest/protocol.html#define-the-problem-doc
If that’s not what’s going on, we can definitely keep troubleshooting to find the bug.
Mike
… On Mar 14, 2023, at 10:06 PM, James ***@***.***> wrote:
The input file was the output of the seed_to_alignment, i thought. I used the 05_clean-aligned-dataframe.csv as the input for ali_to_anc, without any cleaning.
—
Reply to this email directly, view it on GitHub <#33 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFZA6T2O54CFHFA5AHNQ3DW4FE5BANCNFSM6AAAAAAVFS7K6E>.
You are receiving this because you commented.
|
No, that's what I did. I was hoping Topiary would 'fill in' around that sequence, but it seems like it's looking for that to be the 'edge' of sequence space instead. I will have to redesign my experiment to incorporate this behavior. |
Hopefully it works for you then. 🤞Topiary fills in sequences within the species boundaries defined in the seed data frame. You basically need one more sequence in your seed data frame to start it going.MikeSent from my iPhoneOn Mar 14, 2023, at 22:36, James ***@***.***> wrote:
No, that's what I did. I was hoping Topiary would 'fill in' around that sequence, but it seems like it's looking for that to be the 'edge' of sequence space instead. I will have to redesign my experiment to incorporate this behavior.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
I've filed out the seed alignment, and ran into an error that I suspect is because of the format of my seed alignment. I have attached the seed alignment. Do you recognize what might cause this: File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/opentree/util.py", line 69, in _validate_ott_or_species Here is the full error stack: (topiary_resolved) [vanant25@dev-intel18 topiary]$ topiary-seed-to-alignment ER_Seed.csv --out_dir ER_AlignChecking blastp
Checking makeblastdb
Checking muscle
Building initial topiary dataframe.Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): Caught exception in function 'seed_to_alignment'. Returning to starting The above exception was the direct cause of the following exception: Traceback (most recent call last): Function seed_to_alignment raised an error. To see command line help, run topiary-seed-to-alignment --help |
Okay, it should work now. (Or, actually, it should fail now with a useful error). It turns out one of your species, Gulo gulo luscus, is not in the open tree of life database. Topiary was supposed to let you know this was the problem, but was choking on opentreeoflife output. I just pushed a change so it should now do so. I suspect you want to replace "Gulo gulo luscus" with "Gulo gulo" (https://tree.opentreeoflife.org/taxonomy/browse?id=752563) Best, Mike |
I changed the species name in the seed alignment, which advanced me further than I have been able to get before. Unfortunately, the alignment hit a critical error again. I have uploaded what I think is the final alignment file that was used. Terminal Saved Output_Topiary_Error.txt |
I attempted to create an alignment from a seed of six sequences from four species (this is my input csv file):
species,name,aliases,sequence,accession
Homo sapiens,TTHY_HUMAN,hTTR,GPTGTGESKCPLMVKVLDAVRGSPAINVAVHVFRKAADDTWEPFASGKTSESGELHGLTTEEEFVEGIYKVEIDTKSYWKALGISPFHEHAEVVFTANDSGPRRYTIAALLSPYSYSTTAVVTNPKE,P02766
Saccoglossus kowalevskii,D1LXG7,Acorn worm HIUase,MSGYRIDILTNHLRASQAHSNLIEAVNMAGQQSPLTTHVLDTALGRPAAELPITLYSRSPEMAWLKIAAGKTNQDGRCPGLLTQETFHNGVYKIHFDTGTYHKALDTPGFYPYVEVVFEIHDPNQHYHVPLLLSPFSYSTYRGS,D1LXG7
Danio rerio,HIUH_DANRE,Danio Rerio HIUase,MNRLQHIRGHIVSADKHINMSATLLSPLSTHVLNIAQGVPGANMTIVLHRLDPVSSAWNILTTGITNDDGRCPGLITKENFIAGVYKMRFETGKYWDALGETCFYPYVEIVFTITNTSQHYHVPLLLSRFSYSTYRGS,Q06S87
Mus musculus,HIUH_MOUSE,Mouse HIUase,MATESSPLTTHVLDTASGLPAQGLCLRLSRLEAPCQQWMELRTSYTNLDGRCPGLLTPSQIKPGTYKLFFDTERYWKERGQESFYPYVEVVFTITKETQKFHVPLLLSPWSYTTYRGS,Q9CRB3
Mus musculus,TTHY_MOUSE,Mouse Transthyretin,GPAGAGESKCPLMVKVLDAVRGSPAVDVAVKVFKKTSEGSWEPFASGKTAESGELHGLTTDEKFVEGVYRVELDTKSYWKTLGISPFHEFADVVFTANDSGHRHYTIAALLSPYSYSTTAVVSNPQN,P07309
It seems it worked until the reciprocal blast, then I got the following error (it did create a blast results xml file and a initial dataframe file with 3414 lines):
==========
Building initial topiary dataframe.
BLASTing against NCBI database nr
Performing 5 BLAST queries against the NCBI nr database
on 1 threads. Depending on the server load, this could
take awhile. This is a good time to grab a cup of coffee.
BLAST query complete.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping.
Downloading 69 blocks of ~50 sequences...
100%|███████████████████████████████████████████| 69/69 [00:48<00:00, 1.42it/s]
Getting OTT species ids for all species.
Unknown/unrecognized query ids (skipped):
ott4992270
ott615879
ott7659998
ott773491
ott838061
ott898631
Doing reciprocal blast.
Downloading Danio rerio proteome
Downloading proteome for taxid '7955'
Process Process-11:
Traceback (most recent call last):
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/ftp.py", line 36, in _ftp_thread
ftp.retrbinary(cmd="RETR " + file_name,
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 445, in retrbinary
return self.voidresp()
^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 259, in voidresp
resp = self.getresp()
^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 244, in getresp
resp = self.getmultiline()
^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 230, in getmultiline
line = self.getline()
^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 218, in getline
raise EOFError
EOFError
Traceback (most recent call last):
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 406, in seed_to_alignment
proteome_list.append(topiary.ncbi.get_proteome(taxid=this_taxid))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py", line 217, in get_proteome
ncbi_ftp_download(genome_url,file_base="_protein.faa.gz")
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py", line 80, in ncbi_ftp_download
md5_dict = _read_md5_file(md5_file)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py", line 33, in _read_md5_file
file = col[1][2:].strip()
~~~^^^
IndexError: list index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function
ret = fcn(**fcn_args.dict)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:
Caught exception in function 'seed_to_alignment'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 26, in
main()
File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 21, in main
wrap_function(seed_to_alignment,
File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function
raise RuntimeError(err) from e
RuntimeError:
Function seed_to_alignment raised an error.
To see command line help, run topiary-seed-to-alignment --help
The text was updated successfully, but these errors were encountered: