Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion from ete3 trees to skbio trees #250

Closed
mortonjt opened this issue Jan 9, 2017 · 4 comments
Closed

Conversion from ete3 trees to skbio trees #250

mortonjt opened this issue Jan 9, 2017 · 4 comments

Comments

@mortonjt
Copy link
Contributor

mortonjt commented Jan 9, 2017

Right now, there is no way (I know of) that can convert ete3 trees into skbio trees. So right now, I'm doing the conversions by passing around the newick strings. However, this is showing to be problematic. Specifically if I try to run the following code

import random
from ete3 import Tree
from skbio import TreeNode

random.seed(0)
t = Tree()
t.populate(50, random_branches=True)
for i, n in enumerate(t.traverse()):
    n.name = 'y%d' % i
st = TreeNode.read([t.write()])

I get the following two trees

# ETE3
>>> t.children
[Tree node 'y1' (-0x7fffffffedd8a115), Tree node 'y2' (-0x7fffffffedd8a107)]
# skbio
>>> st.children
[<TreeNode, name: 0.139193, internal node count: 33, tips count: 35>,
 <TreeNode, name: 0.460504, internal node count: 13, tips count: 15>]

So the ete3 trees have a different naming scheme that I don't quite understand.
The hack around this for the time being is just to fix the resulting skbio tree.

st = TreeNode.read([t.write()])
for i, n in enumerate(st.levelorder(include_self=True)):
    n.name = 'y%d' % i 

However, it would be nice if there were a formal solution available. cc @gregcaporaso

@jhcepas
Copy link
Member

jhcepas commented Jan 10, 2017

@mortonjt, you need to export the newick string using format=1, so it keeps internal node names.

import random
from ete3 import Tree
from skbio import TreeNode
from io import StringIO

random.seed(0)
t = Tree()
t.populate(10, random_branches=True)
for i, n in enumerate(t.traverse()):
    n.name = 'y%d' % i
print(t.get_ascii(attributes=["name", "dist"]))
print(t.children)

st = TreeNode.read(StringIO(t.write(format=1, format_root_node=True)))
print(st.ascii_art())
print(st.children)

#                                                    /-y7, 0.09327186435241008
#                              /y3, 0.5046868558173903
#                             |                      \-y8, 0.8400912165554131
#        /y1, 0.7579544029403025
#       |                     |                                             /-y15, 0.6118970848141451
#       |                     |                       /y9, 0.9872592010330129
#       |                      \y4, 0.28183784439970383                     \-y16, 0.8280632784038988
#       |                                            |
# -y0, 1.0                                            \-y10, 0.5325636340885271
#       |
#       |                                           /-y11, 0.3101475693193326
#       |                     /y5, 0.9182343317851318
#       |                    |                     |                       /-y17, 0.5598136790149003
#       |                    |                      \y12, 0.7298317482601286
#        \y2, 0.420571580830845                                            \-y18, 0.35379132951924075
#                            |
#                            |                      /-y13, 0.9666063677707588
#                             \y6, 0.8298529036589914
#                                                   \-y14, 0.47700977655271704
# [Tree node 'y1' (0x100612a9), Tree node 'y2' (-0x7fffffffeff9ed54)]
#                               /-y7
#                     /y3------|
#                    |          \-y8
#           /y1------|
#          |         |                    /-y15
#          |         |          /y9------|
#          |          \y4------|          \-y16
#          |                   |
# -y0------|                    \-y10
#          |
#          |                    /-y11
#          |          /y5------|
#          |         |         |          /-y17
#          |         |          \y12-----|
#           \y2------|                    \-y18
#                    |
#                    |          /-y13
#                     \y6------|
#                               \-y14
# [<TreeNode, name: y1, internal node count: 3, tips count: 5>, <TreeNode, name: y2, internal node count: 3, tips count: 5>]

@mortonjt
Copy link
Contributor Author

Awesome! I think that basically resolves my immediate issues. Thanks!

However, it looks like the attributes aren't copied over from ete3 to skbio.
Copying these attributes over could be still be useful in a conversion function.

@jhcepas
Copy link
Member

jhcepas commented Jan 10, 2017

nope, attributes are not transferred through serialization. I discussed this with @gregcaporaso et al once, but skbio.Tree was still not stable and we were unsure about how to transfer attrs.
If done, it would probably make sense to have method that copies the tree properly and not through newick serialization.

@jhcepas
Copy link
Member

jhcepas commented Apr 24, 2017

closing as exporting features does not seem to be necessary for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants