Skip to content

Great dataset analysis

Ryan Wick edited this page Jul 13, 2020 · 7 revisions

Trycycler cluster

To begin, I ran Trycycler cluster on the assemblies:

trycycler cluster --reads reads.fastq.gz --assemblies assemblies/*.fasta --out_dir trycycler

Which produced this output.

This is an extremely clean result: every input assembly had two contigs with one going into cluster 1 and the other going into cluster 2. The two resulting clusters both look legit!

Trycycler reconcile

I then ran Trycycler reconcile on each cluster:

trycycler reconcile --reads reads.fastq.gz --cluster_dir trycycler/cluster_001
trycycler reconcile --reads reads.fastq.gz --cluster_dir trycycler/cluster_002

Which produced this output for cluster 1 and this output for cluster 2.

Once again, a very clean result. Some contigs needed a little bit of sequence added/removed to circularise them and others were already circular. The final checks (pairwise identities and maximum indels) look very clean for both clusters. This tells me that every contig sequence is solid – I didn't need to exclude any contigs and re-run Trycycler reconcile.

Trycycler MSA, partition and consensus

The remaining steps are more hands-off. First running Trycycler MSA for each cluster:

trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_002

(cluster 1 MSA output and cluster 2 MSA output)

Then Trycycler partition:

trycycler partition --reads reads.fastq.gz --cluster_dirs trycycler/cluster_*

(partition output)

Then Trycycler consensus on each cluster:

trycycler consensus --cluster_dir trycycler/cluster_001
trycycler consensus --cluster_dir trycycler/cluster_002

(cluster 1 consensus output and cluster 2 consensus output)

All of these steps ran without any problem!

I then concatenated each cluster's consensus into a single FASTA for the genome:

cat trycycler/cluster_*/7_final_consensus.fasta > assembly.fasta

Final thoughts

This dataset required no manual intervention and thus represents the easiest possible case for running Trycycler. The high quality of the input reads also meant that the final assembly was quite accurate: no errors in the plasmid and only a few homopolymer-length errors in the chromosome.

Clone this wiki locally