Skip to content

Snakemake project to predict orthogroups and find patterns of postive selection with orthofinder and fastcodeml

License

Notifications You must be signed in to change notification settings

jlanga/smsk_selection

Repository files navigation

smsk_selection: A Snakemake pipeline to find orthologs and marks of positive selection

1. Description

This is a pipeline to (briefly described):

  1. Predict proteins from transcriptomes (transdecoder),
  2. Find orhogroups with OrthoFinder, and methods from Yang et al.
  3. Find patterns of positive selection with FastCodeML.
  4. Annotate transcripts with transdecoder / trinotate
  5. Assess transcriptome completeness with Busco

smsk_selection pipeline

2. First steps

  1. Install conda

  2. Install snakemake:

conda install --yes snakemake
  1. Clone this repo. In case of error with SSL certificates, add -c http.sslVerify=false
git clone --recursive https://github.com/jlanga/smsk_orthofinder.git
  1. Compile the necessary dependencies: phyx, guidance and fastcodeml:
bash src/compile_deps.sh
  1. Introduce the paths to your samples in samples.tsv.

  2. Run the pipeline as is:

snakemake --use-conda --jobs

or run it inside a Docker container:

bash src/docker_run.sh -j 4 

3. File organization

The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:

smsk_selection
├── data: raw data, downloaded fastas, databases,....
├── README.md
├── Snakefile: Pipeline runner
├── results: processed data.
|   ├── busco: SCOs identified
|   ├── cdhit: clustered transcriptome
|   ├── homologs: clustered orthogroups as in Yang et al.
|   ├── orthofinder: clustered orthogroups by orthofinder
|   ├── selection: alignments and positive selection results
|   ├── transcriptome: links to input transcriptomes
|   ├── transdecoder: predicted CDS
|   ├── tree: ML and bayesian species tree from 4fold degenerate sites
|   └── trinotate: transcriptome annotation
└── src: additional source code, tarballs, snakefiles, etc.

4. Requirements

To run this pipeline it should be only necessary to have snakemake and conda / mamba. They together are able to download the required packages to run each step.

In case of doubt, the Dockerfile contains the list of the required packages to install.

Bibliography

About

Snakemake project to predict orthogroups and find patterns of postive selection with orthofinder and fastcodeml

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published