Skip to content
Claudia Solis-Lemus edited this page May 30, 2023 · 7 revisions

The sequence alignments, one for each gene, are in nexus format and bundled in a tarball. We first navigate to the data directory:

$ cd data_results/baseline.gamma0.3_n30/
$ ls input/
1_seqgen.in	1_seqgen.tar.gz

1_seqgen.tar.gz is a tarball that contains all 30 alignments (30 loci):

$ tar -ztf input/1_seqgen.tar.gz
1_seqgen10.nex
1_seqgen11.nex
1_seqgen12.nex
1_seqgen13.nex
...
1_seqgen6.nex
1_seqgen7.nex
1_seqgen8.nex
1_seqgen9.nex

Let's look at the first alignment in input/1_seqgen.tar.gz/1_seqgen1.nex. We can decompress the nexus files into a new folder that we will call nexus, then look at the first alignment:

cd input
mkdir nexus
tar -xzvf 1_seqgen.tar.gz -C nexus
ls nexus
cat nexus/1_seqgen1.nex
less -S nexus/1_seqgen1.nex

(type q to quit viewing the file)

The alignment looks like this, showing only 6 taxa and 500 bp (for faster analyses during the workshop) -- and yes these data were simulated:

#NEXUS
[
Generated by seq-gen Version 1.3.2x
Simulations of 6 taxa, 500 nucleotides
for 30 tree(s) with 1 dataset(s) per tree
Branch lengths of trees multiplied by 0.018
Rate homogeneity of sites.
Model = HKY: Hasegawa, Kishino & Yano (1985)
transition/transversion ratio = 2 (K=4.21179)
with nucleotide frequencies specified as:
A=0.300414 C=0.191363 G=0.196748 T=0.311475
]
Begin DATA;	[Tree 1]
    Dimensions NTAX=6 NCHAR=500;
    Format MISSING=? GAP=- DATATYPE=DNA;
    Matrix
6 TTGAAACGGGTAATTTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGTTGATGACTTCATAGCTATAAGAGGCATTATAGTATGCATGTTCCGTCAGACTCGCCCACTACAGAGCTATGTAAACAGTGGGGGCTGGTACAACTCCCTACCGATTGAATCTTATAATGGCGTATGATGTTAACGCGCTCTTGAATTGTCTTTTAAGCATAAGGGCTTTGGATAGATTAATCTTGCTTTAAATCACTCTAGCAGAAGCGTACGTTTTAATCAGACATTAACACGTTGTCGATCCATTTCAACACACACTGTTCAGTACCTTGGATCTATAAGATCCATGGGTATACCACATTTGTTGTTGCCGCTTGTGTACCCTGGTGAATGGCGTTAAGACTCCAGAGTAACCTGCTAGCTACACGCATCATGAACGGCTATGCCGATAGCTGACAAGTTCTTACGTCTAGGGTCTTAGCACCGCCATTCCCAGGTAAAG
5 TTGAAACGTGTAATTTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGACTTCTTGGCCATAAAAGGCATTGTAGTATGCATGTGCCGTCAGACCCGCCTATAACAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGAATGATGTTAACGCGCTCTTGAATTGTCTTTTAAGCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGCTTTAATCAACCGTTAACACATTGTCGATCCATTTCAACACACTCTGTTCAATACCTTGGATCTATAAGATCCATGGGTTTACAACATTTGTTGTTGCTGCTCGTATACCCTGGCGGATGGCGTTAGATCTCCAGAGTAACCTGCTAGCTACACATATCGTGAATGGCTATGTCGATAACGGACAAGTTCCTACGTCTAGGATCTTAGTACCGGCATTCCCAAGTGAAG
1 TTGAAACGGGTAATCTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGATTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAGTCTTATAATGGTGAATGATGTTAACGCGCTATTGAATTGTCTTTTAAGCATGAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAACCGTTAACACATTGTCGATCCATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATGTGTTGTTTCTGCTTGTCTACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATTTGCTAGCTACACGTATCGTGGACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
2 TTGAAACGGGTAATCTTACTTATCGATTATAAGCATCATACCTGATATGGTTGTTTGCTGATGGTTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAGTCTTATAATGGTGAATGATGTTAACGCGCTATTGAATTGTCTTTTAAGCATGAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAACCGTTAACACATTGTCGATCCATTTCCACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATGTGTTGTTTCTGCTTGTCTACCCTGGTGAGTGGCGTTAGGTTTCCAGAGTAATCTGCTAGCTACACGTATCGTGGACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
3 TTGAAACGGGTAATCATACTTATCGATTATAAGCATCATACATGATACGGTTGTTTGCTGATGATTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGATTGATGTTAACGCTCTATTGAATTGTCTTTCAATCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAATCGTTAACACATTGTCGATCAATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCTTGGGTACACAACATTTGTTGTTTTTGCTTGTATACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATCTGCTAGCTACACGTATCGTGAACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCACCGTTCCCAGGTGAAG
4 TTCAAACGGGTAATCATACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGATTTCTTAGCTATCAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAGATAATGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGAATGATGTTAACGCTCTATTGAATTGTCTTTCAAGCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAATCGTTAACACATTGTCGATCAATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATTTGTTGTTTCTGCTTGTATACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATCTGCCAGCTACACGTATCGTGAACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
    ;
END;

now go back to the main folder for the 30-gene data, because later analyses will start from there:

$ cd ..
$ pwd
/home/moleuser/phylo-networks/data_results/baseline.gamma0.3_n30

Next: gene trees with MrBayes

PhyloNetworks Workshop

Clone this wiki locally