-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors and warnings for trees without internal node names #283
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,6 +128,7 @@ def run(args): | |
attributes = ['branch_length'] | ||
|
||
# check if tree is provided an can be read | ||
T = None #otherwise get 'referenced before assignment' error if reading fails | ||
for fmt in ["newick", "nexus"]: | ||
try: | ||
T = Phylo.read(args.tree, fmt) | ||
|
@@ -165,8 +166,10 @@ def run(args): | |
# if not specified, construct default output file name with suffix _tt.nwk | ||
if args.output_tree: | ||
tree_fname = args.output_tree | ||
else: | ||
elif args.alignment: | ||
tree_fname = '.'.join(args.alignment.split('.')[:-1]) + '_tt.nwk' | ||
else: | ||
tree_fname = '.'.join(args.tree.split('.')[:-1]) + '_tt.nwk' | ||
|
||
if args.timetree: | ||
# load meta data and covert dates to numeric | ||
|
@@ -215,10 +218,13 @@ def run(args): | |
import json | ||
tree_success = Phylo.write(T, tree_fname, 'newick', format_branch_length='%1.8f') | ||
print("updated tree written to",tree_fname, file=sys.stdout) | ||
|
||
if args.output_node_data: | ||
node_data_fname = args.output_node_data | ||
else: | ||
elif args.alignment: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it's sensible for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'll make a PR with this change in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I played around with this in So the deeper question is, if you I played around and for V1, changed Thanks to the wonderful raw tree I am using, this looks like a hot mess, but it works...: By 'works' I mean that you can see different AA mutations at the tips, for example. Branches always seem to be just coloured by the first AA in the legend, I guess that's a default behaviour somewhere for if there's no other information for the branch? But assume this could be changed to default to grey? For traits there ends up being no difference between just adding the trait from the metadata. Is this the kind of behaviour we want? Or do we not want to support this? (I tried to test this for V2 as well, but didn't realise it required reinstalling Auspice... so I've modified There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both the V1 exports that work in 'current' auspice and the V2 export do NOT work in V2 auspice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So no, it doesn't work for mutations, in V1 or V2. We could in theory still support |
||
node_data_fname = '.'.join(args.alignment.split('.')[:-1]) + '.node_data.json' | ||
else: | ||
node_data_fname = '.'.join(args.tree.split('.')[:-1]) + '.node_data.json' | ||
|
||
write_json(node_data, node_data_fname) | ||
print("node attributes written to",node_data_fname, file=sys.stdout) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,12 @@ | |
from .utils import read_node_data, load_features, write_json, write_VCF_translation | ||
from treetime.vcf_utils import read_vcf | ||
|
||
class MissingNodeError(Exception): | ||
pass | ||
|
||
class MismatchNodeError(Exception): | ||
pass | ||
|
||
def safe_translate(sequence, report_exceptions=False): | ||
"""Returns an amino acid translation of the given nucleotide sequence accounting | ||
for gaps in the given sequence. | ||
|
@@ -181,7 +187,15 @@ def assign_aa_vcf(tree, translations): | |
#get mutations on the root | ||
root = tree.root | ||
aa_muts[root.name]={"aa_muts":{}} | ||
#If has no root node name, exit with error | ||
if root.name is None: | ||
print("\n*** Can't find node name for the tree root!") | ||
raise MissingNodeError() | ||
|
||
for fname, prot in translations.items(): | ||
if root.name not in prot['sequences']: | ||
print("\n*** Can't find %s in the alignment provided!"%(root.name)) | ||
raise MismatchNodeError() | ||
root_muts = prot['sequences'][root.name] | ||
tmp = [] | ||
for pos in prot['positions']: | ||
|
@@ -193,9 +207,15 @@ def assign_aa_vcf(tree, translations): | |
for c in n: | ||
aa_muts[c.name]={"aa_muts":{}} | ||
for fname, prot in translations.items(): | ||
if n.name not in prot['sequences']: | ||
print("\n*** Can't find %s in the alignment provided!"%(root.name)) | ||
raise MismatchNodeError() | ||
n_muts = prot['sequences'][n.name] | ||
for c in n: | ||
tmp = [] | ||
if c.name is None: | ||
print("\n*** Internal node missing a node name!") | ||
raise MissingNodeError() | ||
c_muts = prot['sequences'][c.name] | ||
for pos in prot['positions']: | ||
#if pos in both, check if same | ||
|
@@ -211,6 +231,39 @@ def assign_aa_vcf(tree, translations): | |
|
||
return aa_muts | ||
|
||
def assign_aa_fasta(tree, translations): | ||
aa_muts = {} | ||
|
||
#fasta input shouldn't have mutations on root, so give empty entry | ||
root = tree.get_nonterminals()[0] | ||
aa_muts[root.name]={"aa_muts":{}} | ||
|
||
for n in tree.get_nonterminals(): | ||
if n.name is None: | ||
print("\n*** Tree is missing node names!") | ||
raise MissingNodeError() | ||
for c in n: | ||
aa_muts[c.name]={"aa_muts":{}} | ||
for fname, aln in translations.items(): | ||
for c in n: | ||
if c.name in aln and n.name in aln: | ||
tmp = [construct_mut(a, int(pos+1), d) for pos, (a,d) in | ||
enumerate(zip(aln[n.name], aln[c.name])) if a!=d] | ||
aa_muts[c.name]["aa_muts"][fname] = tmp | ||
elif c.name not in aln and n.name not in aln: | ||
print("\n*** Can't find %s OR %s in the alignment provided!"%(c.name, n.name)) | ||
raise MismatchNodeError() | ||
else: | ||
print("no sequence pair for nodes %s-%s"%(c.name, n.name)) | ||
|
||
if n==tree.root: | ||
aa_muts[n.name]={"aa_muts":{}, "aa_sequences":{}} | ||
for fname, aln in translations.items(): | ||
if n.name in aln: | ||
aa_muts[n.name]["aa_sequences"][fname] = "".join(aln[n.name]) | ||
|
||
return aa_muts | ||
|
||
def get_genes_from_file(fname): | ||
genes = [] | ||
if os.path.isfile(fname): | ||
|
@@ -314,32 +367,23 @@ def run(args): | |
'strand': 1} | ||
|
||
## determine amino acid mutations for each node | ||
if is_vcf: | ||
aa_muts = assign_aa_vcf(tree, translations) | ||
else: | ||
aa_muts = {} | ||
|
||
#fasta input shouldn't have mutations on root, so give empty entry | ||
root = tree.get_nonterminals()[0] | ||
aa_muts[root.name]={"aa_muts":{}} | ||
|
||
for n in tree.get_nonterminals(): | ||
for c in n: | ||
aa_muts[c.name]={"aa_muts":{}} | ||
for fname, aln in translations.items(): | ||
for c in n: | ||
if c.name in aln and n.name in aln: | ||
tmp = [construct_mut(a, int(pos+1), d) for pos, (a,d) in | ||
enumerate(zip(aln[n.name], aln[c.name])) if a!=d] | ||
aa_muts[c.name]["aa_muts"][fname] = tmp | ||
else: | ||
print("no sequence pair for nodes %s-%s"%(c.name, n.name)) | ||
if n==tree.root: | ||
aa_muts[n.name]={"aa_muts":{}, "aa_sequences":{}} | ||
for fname, aln in translations.items(): | ||
if n.name in aln: | ||
aa_muts[n.name]["aa_sequences"][fname] = "".join(aln[n.name]) | ||
|
||
try: | ||
if is_vcf: | ||
aa_muts = assign_aa_vcf(tree, translations) | ||
else: | ||
aa_muts = assign_aa_fasta(tree, translations) | ||
except MissingNodeError as err: | ||
print("\n*** ERROR: Some/all nodes have no node names!") | ||
print("*** Please check you are providing the tree output by 'augur refine'.") | ||
print("*** If you haven't run 'augur refine', please add node names to your tree by running:") | ||
print("*** augur refine --tree %s --output-tree <filename>.nwk"%(args.tree) ) | ||
print("*** And use <filename>.nwk as the tree when running 'ancestral', 'translate', and 'traits'") | ||
return 1 | ||
except MismatchNodeError as err: | ||
print("\n*** ERROR: Mismatch between node names in %s and in %s"%(args.tree, args.ancestral_sequences)) | ||
print("*** Ensure you are using the same tree you used to run 'ancestral' as input here.") | ||
print("*** Or, re-run 'ancestral' using %s, then use the new %s as input here."%(args.tree, args.ancestral_sequences)) | ||
return 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
write_json({'annotations':annotations, 'nodes':aa_muts}, args.output) | ||
print("amino acid mutations written to",args.output, file=sys.stdout) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great 👍
I would be strongly in favor of changing this to an
ERROR
and exiting at this point, thus forcing the user to generate these internal node names throughaugur refine
, then rerunning ancestral / traits using the correct tree.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might raise a bigger question of how we want these individual parts to operate - more independently or always together? Should it be ok just to run
ancestral
ortranslate
to get some information, without the intention of putting it together or throughexport
(in which case node names don't matter), or do we want to make people always keep the pieces so that they could be put together, because they may make that decision later without remembering the warnings? I can think for both sides, myself - it would be good to hear thoughts of others. (Should I ask in Slack?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for being able to use these features as independent parts, with no intention to run
export
. But if one runsaugur traits
(e.g.) using a tree without internal node names, then you get an output (json) which refers to things like NODE00001 which do not exist in the tree. I think in these cases we want to exit and say "We can only infer things onto a tree with internal node names labelled, else you cannot match the results to your tree! One way to do this is to runaugur refine --tree A --output-tree B
"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to change to this. Feel free to nudge me if I haven't done it in a few days...!