-
Notifications
You must be signed in to change notification settings - Fork 28
Multiple sequence alignment
Before running this step, you'll need to have completed the previous one (reconciling contigs). I.e. you should have a Trycycler output directory (which I'll assume is called trycycler
) with subdirectories for each of your good clusters, each of which contains a 1_contigs
subdirectory and a 2_all_seqs.fasta
file.
This step takes the reconciled contig sequences (2_all_seqs.fasta
) and runs a multiple sequence alignment.
For example, it would take sequences like this:
GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG
And produce an alignment like this:
GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG
The Trycycler msa command must be run separately for each of your good clusters. Assuming your good clusters are numbers 1, 2 and 3, these are the commands you would run:
trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_002
trycycler msa --cluster_dir trycycler/cluster_003
Unlike in previous steps of Trycycler, the msa step should be hands-off. I.e. no manual intervention is required – just run it and wait for it to finish.
Trycycler msa will typically take a few minutes. Longer sequences and larger numbers of sequences will be slower.
Trycycler msa has the following parameters you can adjust:
-
--kmer
: the k-mer size used for sequence partitioning (default = 32). -
--step
: the step size used for sequence partitioning (default = 1000). -
--lookahead
: the look-ahead margin used for sequence partitioning (default = 10000). -
--threads
: this is how many parallel instances of MUSCLE will be used when aligning the sequence partitions. It will only affect the speed performance, so you'll probably want to use as many threads as you have available.
You likely won't need to adjust the partitioning parameters (--kmer
, --step
and --lookahead
) and can just leave them at the defaults. If you're curious about what they are used for, see How multiple sequence alignment partitioning works.
When finished, Trycycler reconcile will make a 3_msa.fasta
file in the cluster directory, a FASTA-formatted multiple sequence alignment of the contigs ready for use in generating a consensus. The consensus step will also need partitioned reads, so that's the next step in the process.
- Home
- Software requirements
- Installation
-
How to run Trycycler
- Quick start
- Step 1: Generating assemblies
- Step 2: Clustering contigs
- Step 3: Reconciling contigs
- Step 4: Multiple sequence alignment
- Step 5: Partitioning reads
- Step 6: Generating a consensus
- Step 7: Polishing after Trycycler
- Illustrated pipeline overview
- Demo datasets
- Implementation details
- FAQ and miscellaneous tips
- Other pages
- Guide to bacterial genome assembly (choose your own adventure)
- Accuracy vs depth