Home

NanoFG

NanoFG is a fusion detection pipeline made for Oxford Nanopore Sequencing data. NanoFG uses the ENSEMBL database to find structural variations (SVs) that produce fusions between two genes. It remaps these SVs using LAST to increase the breakpoint accuracy and reports fusions. It produces a default of 4 output files:

.vcf file containing all candidate fusion genes
.txt file containing information on all correct fusion genes
.pdf file containing a visual overview of the detected fusion genes
.primers text file containing primers for fusion validation

Required tools

Mandatory

Samtools (1.7) - http://samtools.sourceforge.net/

Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

Minimap2 (2.6) - https://github.com/lh3/minimap2

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100.

NanoSV (1.2.4) - https://github.com/mroosmalen/nanosv

Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).

Optional

LAST (921) - http://last.cbrc.jp/doc/last.html

Kielbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Research 21, 487–493 (2011).

Wtdbg2 (2.2) - https://github.com/ruanjue/wtdbg2

Ruan, J. and Li, H. (2019) Fast and accurate long-read assembly with wtdbg2. Nat Methods

Installation

Download NanoFG from github
From the NanoFG directory, run:

virtualenv venv -p </path/to/python>
. venv/bin/activate
pip install -r requirements.txt

How to run

bash NanoFG.sh -f </path/to/fastq> [-n sample_name ] [-s selection] [-cc] [-df] [-dc]

or

bash NanoFG.sh -b </path/to/bam> [-v </path/to/vcf>] [-n sample_name ] [-s selection] [-cc] [-df] [-dc]

The human reference fasta input NanoFG can run with is currently limited to fasta files with a setup where the name is the chromosome number:

>1 (Instead of Chr1)
NNNNN
>2
NNNNN
># etc.

Creation can be done by downloading the reference from:

ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/seqs_for_alignment_pipelines/

and running

sed 's/>chr/>/' </PATH/TO/REFERENCE_FASTA> > </PATH/TO/RESULT_FASTA>

Main parameters

REQUIRED

-f | --fastq

Path to fastq

or

-b | --bam

Path to bamfile

OPTIONAL

-v | --vcf

Path to vcf

-n | --name

Name of the sample to give to output files

-cf | --complex_fusion

If activated, NanoFG links together SVs that occur on the same read. 
Fusions can be found where a small SV inbetween the two genes might inhibit normal NanoFG from detecting a fusion.

-s | --selection

Regions to select from the bamfiles (separated by ',')

Accepted formats:
- Direct region (e.g. 17:7565097-7590856 )
- Ensembl identifier (e.g. ENSG00000141510)
- Common gene name (e.g. TP53)

-cc | --consensus_calling

Creates a consensus of all supporting reads for a breakpoint before calling fusions. 
Increases the accuracy of breakpoint detection, which is especially important for exon-exon fusions. 
Only activate if there is sufficient coverage to create a consensus.

-df | --dont_filter

When activated, NanoFG does not filter breakpoints before and during its steps.

-dc | --dont_clean

When activated, NanoFG does not remove any intermediate files created during its process. 
Important if you want to keep the consensus sequences of the fusion gene after running NanoFG.

-wl | --without_last

When activated, minimap2 instead of last is used for remapping fusion candidates after optional consensus creation and complex fusion detection.
LAST has previously been used as it showed more accurate read mapping over minimap2. 
However, newer versions of minimap2 have shown similar qualities with massive increase in speed ad the lack of additional required files.

Steps

Minimap2 mapping

Mapping of the reads in the fastq file using default settings '-x map-ont -a --MD'

(optional) Region selection

Selection of regions from BAM file with samtools if parameter -s|--selection is given

First, 'samtools -L region_bedfile BAM' is used to select all read names that span a certain region. These read names are then used to select all reads (primary and supplementary alignments) that are partly located in the selected region using 'samtools view BAM | grep -f file_with_read_names'.

SV calling (NanoSV or Sniffles)

By default, NanoSV is used to detect SVs from the minimap2 mapped reads.

Fusion candidate extraction

Using the .vcf file created by NanoSV, the ENSEMBL database is used to annotate all breakpoints with overlapping genes. If a breakpoint overlaps with 2 different genes, it is flagged as a possible fusion gene. Using pysam, all reads that support the breakpoint are extracted.

(optional) Consensus calling

Perform consensus calling on the extracted read for every SV if parameter -cc | --consensus_calling is given

Consensus calling is done by wtdbg2 using the parameters '-x ont -g 3g'

LAST (or minimap2) mapping

All extracted reads are mapped again with LAST, as LAST previously have been show to produce a slightly more accurate breakpoint position than minimap2.

SV calling (NanoSV)

SV calling is performed by NanoSV.

(optional) Complex fusion detection

Perform complex fusion detection if parameter -cf|--complex_fusion is given. Multiple breakpoints that occur on the same read are linked to produce a representation of that area of the genome. The first and last break-end in the read are then reported as a additional SV, giving the possibility to find a complex fusion gene where small SVs have occurred at the fusion breakpoint that inhibit default NanoFG from detecting the fusion.

Checking the candidate fusion genes and adding additional flags

Any SV that can produce a correct fusion are determined and additionally flagged by using information of ENSEMBL and NanoSV and produce a pdf overview of all the fusions in the sample.

Producing output

NanoFG produces a default of 4 output files:

.vcf file containing all candidate fusion genes
.txt file containing information on all correct fusion genes
.pdf file containing a visual overview of the detected fusion genes
.primers text file containing primers for fusion validation

Help

Multiple settings can affect the possibility of NanoFG to detect fusion genes

In the NanoSV config files (in NanoFG/files/) the minimal SV supporting reads (cluster_count) needed to detect SVs is set on 2. With very low coverage, changing this to 1 might make NanoFG detect these fusions but might increase the false positive ratio
If a SV is located in a hard to map area, the breakpoint might be reported in different location in that region. By default, the maximum distance for NanoSV to consider two breakpoints similar is 100 (cluster_distance). Increasing this might lead to the detection of new SVs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly