-
Notifications
You must be signed in to change notification settings - Fork 10
CCMgen
With CCMgen it is possible to generate a synthetic multiple sequence alignment from a Markov Random Field probability model specified by coupling potentials and a user-specified phylogenetic tree.
ccmgen.py [options] rawfile outalnfile
rawfile
should be a MessagePack-formatted raw coupling potential file as generated by the -b
option in CCMpredPy.
outalnfile
is the filename where the sampled alignment should be written, in the format specified by --aln-format
(default: FASTA).
-
--alnfile <aln_file>
: A reference alignment file to determine the diversity (effective number of sequences Neff) and the number of sequences of the synthetic MSA (overwrites--num-sequences
and--mutation-rate-neff
settings) -
--num-sequences <nseq>
: Specify the number of sequences in the synthetic MSA [default: 1024] -
--max-gap-pos <max_gap_pos>
: Ignore alignment positions with >max_gap_pos percent gaps in aln_file. [default: 100 (no removal of gaps)] -
--max-gap-seq <max_gap_seq>
: Remove sequences with >max_gap_seq percent gaps in aln_file. [default: 100 (no removal of sequences)] -
--aln-format <format>
: Parse and write all subsequent alignment files specified on the command line in another format. Supports all BioPython Bio.SeqIO file formats pluspsicov
. [default: fasta] -
--num-threads <num_threads>
: Specify the number of threads. [default: 1]
-
MRF-generated (
--seq0-mrf <nmut>
): Start out with an all-alanine sequence and use the MRF model to evolve the sequence for nmut Gibbs steps [default:nmut=500
]. -
User-specified (
--seq0-file <seq_file>
): Provide the initial sequence from a file (useful if you have e.g. an ancestral sequence reconstruction). Make sure that the sequence identifier matches the name of the root node in the sampling phylogeny.
The options --seq0-file
and --seq0-mrf
are mutually exclusive.
-
User-specified (
--mutation-rate <rate>
): Give a user-specified mutation rate, measured in number of substitutions per unit of evolutionary distance on the phylogentic tree. -
Target Neff (
--mutation-rate-neff [<neff>]
): Set the mutation rate to approximately hit a target number of effective sequences (Neff, calculated as in the HHsuite package). Without specifying<neff>
, the value for Neff will be determined from<aln_file>
(requires--alnfile
option).
The options --mutation-rate
and --mutation-rate-neff
are mutually exclusive but required.
-
User-specified (
--tree-newick <tree_file>
): Evolve the sequences according to an evolutionary tree, e.g. from a phylogenetic reconstruction program -
Binary tree (
--tree-binary
): A binary tree with equally distributed branch lengths -
'Star-shaped' tree (
--tree-star
): A tree where all leaf nodes are direct descendants of the root node. -
MCMC sample (
--mcmc-sampling
): Generate sequences as simple MCMC sample without following a tree topology.
Both binary and star-shaped tree will be generated to have a total evolutionary depth of 1 by default. You can adjust the sampled alignment target diversity by adjusting the mutation rate parameters.
-
--mcmc-sample-random-gapped
: Sample sequences starting from random sequences. Gap structure of randomly selected input sequences will be copied. Gap positions are not sampled. (requires--alnfile
option) [default] -
--mcmc-sample-random
: Sample sequences starting from random sequences -
--mcmc-sample-aln
: Sample sequences starting from sequences inaln_file
. Gap positions are not sampled. (requires--alnfile
option) -
--mcmc-burn-in <mcmc_burn_in>
: Number of Gibbs sampling steps to evolve the Markov chain before a sample is obtained.
The options --mcmc-sample-random
, --mcmc-sample-random-gapped
and --mcmc-sample-aln
are mutually exclusive.
Generate sequences using potentials in example/1atzA.braw.gz
and write results to data/sampled.fasta
. Set the number of sequences in the synthetic alignment to match the number of sequences in example/1atzA.fas
and automatically determine a mutation rate so that the diversity of the generated alignment approximately matches that of example/1atzA.fas
:
ccmgen --alnfile example/1atzA.fas \
--tree-star \
--seq0-mrf 500 \
--mutation-rate-neff \
example/1atzA.braw.gz \
data/sampled.fasta