-
Notifications
You must be signed in to change notification settings - Fork 28
Great dataset analysis
To begin, I ran Trycycler cluster on the assemblies:
trycycler cluster --reads reads.fastq.gz --assemblies assemblies/*.fasta --out_dir trycycler
Which produced this output.
This is an extremely clean result: every input assembly had two contigs with one going into cluster 1 and the other going into cluster 2. The two resulting clusters both look legit!
I then ran Trycycler reconcile on each cluster:
trycycler reconcile --reads reads.fastq.gz --cluster_dir trycycler/cluster_001
trycycler reconcile --reads reads.fastq.gz --cluster_dir trycycler/cluster_002
Which produced this output for cluster 1 and this output for cluster 2.
Once again, a very clean result. Some contigs needed a little bit of sequence added/removed to circularise them and others were already circular. The final checks (pairwise identities and maximum indels) look very clean for both clusters. This tells me that every contig sequence is solid – I didn't need to exclude any contigs and re-run Trycycler reconcile.
The remaining steps are more hands-off. First running Trycycler MSA for each cluster:
trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_002
(cluster 1 MSA output and cluster 2 MSA output)
Then Trycycler partition:
trycycler partition --reads reads.fastq.gz --cluster_dirs trycycler/cluster_*
Then Trycycler consensus on each cluster:
trycycler consensus --cluster_dir trycycler/cluster_001
trycycler consensus --cluster_dir trycycler/cluster_002
(cluster 1 consensus output and cluster 2 consensus output)
All of these steps ran without any problem!
I then concatenated each cluster's consensus into a single FASTA for the genome:
cat trycycler/cluster_*/7_final_consensus.fasta > assembly.fasta
This dataset required no manual intervention and thus represents the easiest possible case for running Trycycler. The high quality of the input reads also meant that the final assembly was quite accurate: no errors in the plasmid and only a few homopolymer-length errors in the chromosome.
- Home
- Software requirements
- Installation
-
How to run Trycycler
- Quick start
- Step 1: Generating assemblies
- Step 2: Clustering contigs
- Step 3: Reconciling contigs
- Step 4: Multiple sequence alignment
- Step 5: Partitioning reads
- Step 6: Generating a consensus
- Step 7: Polishing after Trycycler
- Illustrated pipeline overview
- Demo datasets
- Implementation details
- FAQ and miscellaneous tips
- Other pages
- Guide to bacterial genome assembly (choose your own adventure)
- Accuracy vs depth