Skip to content

Releases: broadinstitute/pilon

Pilon version 1.14

02 Nov 18:38
Compare
Choose a tag to compare

Pilon version 1.14 includes one major functionality enhancement along with several new minor features:

  1. The biggest limitation of using Pilon on diploid genomes has been that it did not properly make calls for heterozygous small indels (those made from read alignments rather than local reassembly). For diploid genomes, Pilon now makes a distinction between homozygous and heterozygous small indel calls in VCF output, and it will only make assembly improvement corrections of small indels which are believed to be homozygous. For haploid genomes, small indel calls will either be confident (PASSing) or ambiguous (just like SNPs), and only confident calls will be corrected in the output FASTA. More details below.
  2. A —version option has been added which causes Pilon to print the version string and exit gracefully.
  3. The —targets option can now optionally specify the name of a file containing the list of target specifications (separated by commas or newlines) or the target specifications directly as a command argument as before.
  4. Some fixes and improvements have been made to genome viewer tracks generated by the —tracks option. A bug was fixed that had previously caused all .wig tracks to begin at coordinate 1, even when a —targets option specified a different start coordinate. Also, the Pilon.bed track has been cleaned up a bit, removing some annotated features which are no longer meaningful to Pilon and adding some which are, including local reassembly result tags (ClosedGap, TandemRepeat, PartialFill, …).

The rest of this announcement will provide more details about the new indel calling behavior.

Diploid variant calling (with --diploid option): In the VCF output, Pilon will now show PASSing indels which are either homozygous (genotype 1/1) or heterozygous (0/1), just as it does for SNPs. For coordinates which were not called because they were inside a containing deletion call, Pilon has always added the Del filter tag. The Del tag will now only be present for coordinates deleted by a homozygous call, so that if one haplotype has a deletion, but the other has a SNP inside those coordinates, both calls can be made.

Diploid assembly improvement: Pilon will only make corrections to the output FASTA file corresponding to homozygous indel calls, so it will be more conservative in making indel corrections.

Haploid variant calling: Just as for SNPs, when Pilon sees sufficient evidence both for and against a indel call, it will now mark the call with the Amb (ambiguous) tag. The result is that the new version will be more conservative in making PASSing indel calls, but more liberal in identifying possible (ambiguous) indels than previous versions. An ambiguous deletion will not cause the records corresponding to the deleted locations to be tagged with the Del filter, since we're not confident they are deleted. Calls which are heterozygous when the --diploid option is used are tagged ambiguous in haploid mode, as has always been the case for SNPs.

Haploid assembly improvement: Pilon will be more conservative in applying indel corrections, since the non-ambiguous thresholds for indel calls are higher. As a side-effect, this ought to make PIlon generate fewer spurious indel corrections when used with sequencing technologies which have indels as the dominant error mode (e.g., 454, PacBio), but I have not tested this.

For back-compatibility, this new version also contains an --oldindel option to revert to the old all-or-nothing indel calling behavior, though please note the heuristic thresholds are slightly different.

--bruce, 2 Nov 2015

Pilon version 1.13

26 Jun 18:28
Compare
Choose a tag to compare

This is a very minor release, adding one new user-requested feature:

When producing a VCF file, if the --vcfqe option is specified, Pilon will include a QE field in the INFO column of the VCF instead of the default QP field. QE shows the quality-weighted evidence in the pileups for each possible base (A, C, G, T, in that order) at that position. QP showed the same information, but normalized as an integer percentage (i.e., the 4 all added up to 100). QE gives more resolution for those applications that want it. Note that the values are only interesting relative to one another; there is no defined unit for the evidence scores.

Example with QP (default): BC=0,2,49,178;QP=0,0,3,97
Same example with QE: BC=0,2,49,178;QE=0,66,7566,262923

If this new option isn't used, everything should be the same as v1.12.

Pilon version 1.12

29 Mar 18:05
Compare
Choose a tag to compare

This release includes two minor enhancements requested by the user community:

  1. There is a new --minmq <m> option, where <m> is the minimum alignment mapping quality for the read (or read pair) to be considered valid for inclusion in pileups. The default is still 0. In addition to filtering out low mapping quality reads in the pileups, setting this to a higher value will cause low mapping quality regions to be considered as potentially bad, triggering local reassembly if it is enabled. Since Pilon already takes mapping quality into account when weighing the evidence in pileups, I don't recommend using this for normal situations, but it's another tool available in the arsenal.
  2. Pilon now supports N, =, and X operators in the CIGAR alignment summary. The aligners I normally test with do not generate these operators, but they are in the spec and used by other aligners, and they should now work.

Also, this version of Pilon has now been updated to use the htsjdk library (version 1.130) for BAM file processing.

Pilon version 1.12 should produce the same results as version 1.11 unless the new features are used.

--bruce, 29 Mar 2015

Pilon version 1.11

22 Jan 02:09
Compare
Choose a tag to compare

Version 1.11 provides more flexibility in BAM preparation for use with Pilon. Prior to this release, Pilon assumed all "proper pairs" marked by the aligner were in FR orientation. Now, Pilon will scan paired BAMs to determine the orientation of proper pairs and use them accordingly. Circularized jump libraries, which are normally sequenced such that the reads naturally appear to be in RF orientation, can now be used in that orientation without being flipped to FR.

This release also supports (and should make better use of) BAMs aligned with an aligner such as bwa mem, which can mark pairs of both FR and RF orientations as valid, such as a typical Illumina circularized jump library, which contains some "innies" as well as "outies". Pilon will now track the insert size distributions of each orientation separately, which allows it to generate a more accurate estimation of the insert sizes of the true jumps.

v1.11 should yield the same results as v1.10 when used with BAMs prepared in FR-only orientation as previously required.

--bruce, 21 Jan 2015

Pilon version 1.10

19 Nov 01:42
Compare
Choose a tag to compare

This release fixes issues related to handling of large input FASTA elements (typically scaffolds or chromosomes). Pilon breaks up very large input FASTA elements into "chunks" which are processed independently. The default maximum chunksize has been 10 Mb; that is, input elements larger than that were broken into pieces smaller than 10 Mb for processing and put back together when generating the output files.

  1. Fixed a bug introduced in v1.9 which resulted in duplicate elements in the output (improved) FASTA file if the input FASTA element was broken into chunks (i.e., it was larger than 10 Mb).
  2. Fixed a bug which could cause the output coordinates in the .changes file (from the --changes option) to be slightly off for changes in the 2nd and subsequent chunks.
  3. Added a new --chunksize option to override the maximum chunksize (still defaults to 10000000).

There is no change in behavior between this release and v1.9 for VCF output or for cases in which no input FASTA element is larger than 10 Mb.

--bruce, 18 Nov 2014

Pilon version 1.9

30 Oct 02:54
Compare
Choose a tag to compare

This release contains several enhancements and heuristic changes designed to make PIlon more resilient when presented with problematic sequencing data:

  1. Reads marked as failing vendor quality control checks will no longer be used by default. To include them, use the new option --nonpf (which replaces the --pf option, reflecting the change in default behavior).
  2. Reads marked as duplicates in the BAM will no longer be used unless the new --duplicates option is specified.
  3. The threshold for triggering local reassemblies based on percentage of bad pairs at a given location is now relative to the overall percentage of bad pairs, not an absolute percentage. This helps reduce the number of spurious local reassemblies when used with poorly constructed mate pair libraries.

This release is also more conservative about reporting partial large events in variant calling (opening gaps in assembly improvement) when used with --fix +breaks (implied by the --variant option):

  1. If Pilon does a local reassembly which is not closed, it will not make a change if the partially assembled sequence matches the input genome.
  2. If a loop in the assembly graph is detected, e.g., from a tandem repeat, it will not make a change.

This version includes an experimental (not extensively tested) multithreading capability. If the --threads <N> option is provided, then initial BAM scans and input FASTA element processing will be done in parallel. This is very coarse-grained multithreading, but it ought to provide some speedup for large genomes.

Finally, this release fixes a bug which occasionally prevented small deletions from being applied to the output .fasta and .changes files; the VCF output was correct.

--bruce, 29 Oct 2014

Pilon version 1.8

04 Jun 18:12
Compare
Choose a tag to compare

This release contains one functional change and two bug fixes.

The functional change relates to how Pilon handles ambiguous bases when doing assembly improvement on haploid genomes. Prior to this release, Pilon would change ambiguous bases to the allele with the most evidence, even though there was significant evidence for multiple alternatives. These are effectively low-confidence changes, and Pilon v1.8 will no longer make these changes to ambiguous sites by default. Add the --fix +amb option to the command line to get the old behavior. This change does not alter how these sites are reported in the VCF, as they have always been tagged with the Amb filter.

The two bug fixes:

  1. Pilon would crash if it attempted to do a local reassembly using a BAM file containing reads without quality scores. It will now assume read base quality according to the --defaultqual option, which defaults to 15.
  2. Pilon v1.7 fixed a bug in which it would sometimes report overlapping changes in the VCF file. However, the way it was implemented, it was possible for the VCF file to have reported one of the overlapping changes and the .changes file to have reported the other. Now, both should be consistent.

--bruce, 4 Jun 2014

Pilon version 1.7

12 Mar 02:46
Compare
Choose a tag to compare

This release fixes two VCF-related bugs:

  1. Sometimes equivalent but shifted deletions (think STRs) could be called, one by alignment pileups and one by local reassembly. In some cases, Pilon failed to filter out one of them, leaving overlapping deletion calls in the VCF. These overlapping changes did not make it into the improved FASTA file, it was just an issue when generating the VCF.
  2. In a haploid variant calling run, if the base immediately following an insertion should have been tagged with the Amb (ambiguous) filter, it was not, leaving it looking like a PASSing heterozygous SNP call (which shouldn't happen in a haploid genome!).

Also, this version is built with Scala version 2.10.3.

--bruce, 11 Mar 2014

Pilon version 1.6

03 Jan 02:41
Compare
Choose a tag to compare

This release adds some features to Pilon's standard output log and the VCF output files to enable better reporting and gathering of statistics. Core functionality is unchanged from v1.5, so the actual calls and changes should be the same.

  1. Near the end of the log, pilon now prints a summary whole-genome mean coverage by BAM type and a total. The calculation is based on usable alignment bases of all good reads divided by the input genome size, and rounded to the nearest integer. Here's an example:
    Mean jumps coverage: 105
    Mean frags coverage: 140
    Mean total coverage: 244
  2. In addition, there are some changes to the reporting of BAM stats which could enable some better metrics about mapped & unmapped reads, etc., as well as a bit of cleaning up of some of the other log output.
  3. PIlon now adds an IMPRECISE INFO flag as described in the VCF spec and added it to any SV records resulting from local reassembly which have Ns in the ALT (new) sequence. I think that's completely consistent with the description of IMPRECISE in the spec. This can occur in variant calling by OpenedGap fixes (e.g., partial inserts) or in assembly improvement on partial gap filling. Here is an example record; the only difference from v1.5 is the addition of the IMPRECISE in the INFO field when there is not a closed solution:
    gi|395136682|gb|CP003248.1| 3930660 . G GNNNNNNNNNNACGGCGGCACCGGCG . PASS SVTYPE=INS;SVLEN=25;END=3930660;IMPRECISE GT 1/1
  4. Large collapsed repeats (>10Kb) have been reported in the standard output log, but now they are also included in the VCF file as candidate segmental duplications. They use SVTYPE=DUP records as descirbed in the VCF spec. These are also IMPRECISE, since the coordinates and length are approximate. Example:
    gi|395136682|gb|CP003248.1| 3554192 . T <DUP> . PASS SVTYPE=DUP;SVLEN=158082;END=3712274;IMPRECISE GT ./.

--bruce, 2 Jan 2014

Pilon version 1.5

30 Oct 03:21
Compare
Choose a tag to compare

This release incorporates more robust handling of local reassemblies (gap filling or fixing of suspected breaks in contiguity) in the presence of tandem repeats. Previous versions were more aggressive, too often leading to false closures with an incorrect tandem repeat copy number.

When Pilon v1.5 detects a tandem repeat (any loop in the local reassembly graph, actually), it will not try to extend further nor try to generate a closure, and it will report the detected tandem repeat length in the standard output log, e.g.,

fix gap: scaffold00001:1561375-1562394 1561375 -0 +150 PartialFill TandemRepeat 243

This release will change results in the presence of tandem repeats, but with fewer incorrect extensions and/or closures.

Additional minor changes:

  1. Migrate Pilon source code base to Scala version 2.10.2
  2. Upgrade embedded picard & samtools libraries to version 1.98

--bruce, 29 Oct 2013