Skip to content

dunnlab/animal_tree_root

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Animal tree root

A comparison of phylogenetic studies relevant to placing the root of the animal phylogeny.

Repo Overview

.
├── data_processed         # Matrices and data tables used for analyses in this manuscript
│   ├── matrices           # Matrices in consistent formats with harmonized taxon names
│   └── tables             # Tabular summaries of previously published datasets and results
├── docker                 # All files needed to create environment for reproducing analyses
└── manuscript             # Files for the manuscript, an R project and associated manuscript code
    ├── figures            # Figures for the manuscript
    └── manuscript_files   # Ancillary files for the manuscript

The matrices we curated, standardized, and started from for all new phylogenetic analyses are stored in data_processed/matrices, eponymously named for their original manuscript. Tables containing summaries, taxon-clade maps, partition-gene maps, etc. are all in data_processed/tables. We maintained a frozen set of applications for the project as a Docker container image, defined in the docker directory. See the readme there for instructions for building and running an RStudio Server that is compatible with the manuscript. The manuscript is stored in the manuscript directory as an R Markdown file, and the data we use in visualization and summary are all stored in an RData file to avoid needing to re-run some time-consuming functions.

git LFS

All the files in this repo that match the patterns defined in .gitattributes are tracked with git large file storage. Install Git LFS following the instructions on the project's website to work with this repo.

Getting a fresh copy of the full repo would look something like this:

git lfs install
git clone https://github.com/dunnlab/animal_tree_root.git

Recreating full project

Because the original git repository for this project is quite large, this one is distilled to just what is needed for

  • Launching new analyses based on the standardized matrices in data_processed/matrices
  • Examinging the data used in the manuscript R analyses and figures

If you would like to recreate the full repository that includes raw output from our analyses, download the data from Figshare. The data are split into three archived directories which can be downloaded separately or all together.

data_raw.tar.xz: The data from each previous study we used.

reconciliation.tar.xz: Files and scripts used to standardize naming and formats across the datasets used here.

trees_new.tar.xz: Results from the new analyses we did over the course of this study. Abbridged and summarized portiuons of these data are imported into the manuscript's R environment which is included in this repo as (manuscript/manuscript.RData)[manuscript/manuscript.RData]. You can examine it with the Docker environment, outlined in the (docker)[docker] directory.

Download everything

cd animal_tree_root
wget https://ndownloader.figshare.com/articles/13122881/versions/1 -O tmp.zip
unzip tmp.zip && rm tmp.zip

# this may take a bit
for t in *.tar.xz; do
echo "Expanding $t ..."
tar xf $t && rm $t
done

Glossary

Throughout this repo and the manuscript itself we standardize the following terms and their definitions:

manuscript: The study from which a dataset or analysis is derived.

matrix: What we call a multiple sequence alignment that consists of one or more partitions. A manuscript can have one or more matrices.

gene: a set of homologous partitions. Globally consistent across matrices.

partition: a set of homologous sequences within a matrix, they consist of the same matrix columns

taxon: The name of a taxon as it appears in a matrix row name or tree tip. Usually but not always a species.

clade: a set of taxa, eg Cnidaria

sequence: a gene sequence in a specific partition for a specific taxon. It is a 1D character string, and a segment of a matrix row.

Citation

Pre-print

Yuanning Li, Xing-Xing Shen, Benjamin Evans, Casey W Dunn, Antonis Rokas. bioRxiv 2020.10.27.357798; doi: https://doi.org/10.1101/2020.10.27.357798

Additional Datasets

Li, Yuanning; Shen, Xing-Xing; Evans, Benjamin; W. Dunn, Casey; Rokas, Antonis (2020): Rooting the animal tree of life. figshare. Dataset. https://doi.org/10.6084/m9.figshare.13122881.v1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages