Starting Trycycler consensus (2020-07-06 14:24:55) Trycycler consensus is the final stage of the Trycycler pipeline. It operates on one replicon (i.e. cluster) at a time. It takes the multiple sequence alignment of alternative contig sequences and combines them into a single consensus sequence. Where needed, it will use read alignments to help choose which variants to include/exclude from the consensus sequence. If all goes well, the final consensus will be free of any large-scale errors. Input reads: trycycler/cluster_001/4_reads.fastq 31,959 reads (192,572,569 bp) N50 = 8,431 bp Input contigs: trycycler/cluster_001/2_all_seqs.fasta A_contig_1: 1,044,291 bp B_contig_1: 1,044,295 bp C_contig_1: 1,044,313 bp D_contig_1: 1,044,336 bp E_contig_1: 1,044,318 bp F_utg000001c: 1,044,419 bp G_utg000001c: 1,044,418 bp H_utg000001c: 1,044,435 bp I_utg000001c: 1,044,409 bp J_utg000001c: 1,044,416 bp Checking required software: minimap2: v2.17-r954-dirty Partitioning MSA (2020-07-06 14:24:55) The multiple sequence alignment is now partitioned into chunks. Chunks where the input contig sequences are all in agreement are called "same" chunks, and those where the input contig sequences disagree are called "different" chunks. The consensus sequence will be made by choosing a best option for each of the different chunks. chunks: 1,403 (702 same, 701 different) combining small chunks: 1,285 (643 same, 642 different) Saving sequences to graph: trycycler/cluster_001/5_chunked_sequence.gfa Initial consensus (2020-07-06 14:24:58) Trycycler now makes an initial consensus sequence by choosing a sequence for each of the different chunks. The chosen sequence is the one with the lowest total Hamming distance to the other sequences. For example, a chunk with options of TT, TT, CC, CC and TA will give a consensus of TT. If the total Hamming distances fail to break a tie or if all sequences differ, the chunk will be flagged for read-based assessment. Consensus length: 1,044,449 bp Different chunks needing assessment: 9 Different chunks not needing assessment: 633 Saving sequence to file: trycycler/cluster_001/6_initial_consensus.fasta Indexing reads (2020-07-06 14:24:58) Trycycler now aligns all reads to the initial consensus to form an index of which reads span each of the chunks. This makes the following step faster, as only relevant reads will be used when conducting read-based assessment of chunks. Aligning reads to the initial consensus: 32,130 alignments Filtering for best alignment per read: 31,959 alignments Gathering reads for chunks: 9 / 9 Choosing best options with reads (2020-07-06 14:25:17) For each of the chunks to be assessed, Trycycler now aligns the relevant reads to each alternative sequence. Whichever option gives the strongest read alignments (defined as the total alignment score for each of the read's best alignment) is chosen as the best. This should result in a consensus sequence which is more accurate than the initial consensus. Processing chunks: 9 / 9 Chunks where sequence is... the same as in the initial consensus: 1 different to the initial consensus: 8 Saving sequence to file: trycycler/cluster_001/7_final_consensus.fasta