isONform - Reference-free isoform reconstruction from long read sequencing data

Create a new environment for isONform (at least python 3.7 required):
conda create -n isonform python=3.10 pip
conda activate isonform
Install isONcorrect and SPOA
pip install isONcorrect
conda install -c bioconda spoa
Install other dependencies of isONform:
conda install networkx
pip install parasail
clone this repository

Introduction

IsONform generates isoforms out of clustered and corrected long reads. For this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging. The algorithm uses spoa to generate the final isoforms.

Input data

The isONpipeline takes .fastq files generated with long-read sequencing techniques (ONT or Pacbio) as an input that additionally have been cleaned of barcodes. Please make sure that you run the isONpipeline on data that have been processed with LIMA (Pacbio data) or Pychopper (ONT data) so that all the barcodes are removed from the reads

Running isONform

To only run the isONform algorithm:

isONform_parallel --fastq_folder path/to/input/files --t <nr_cores> --outfolder /path/to/outfolder --split_wrt_batches

Note: Please always use absolute paths to the files or folders

The full isON-pipeline (isONclust, isONcorrect, isONform) can be found here and is run via:

./isON_pipeline.sh --raw_reads </absolute/path/to/raw_reads.fq>  --outfolder <outfolder>  --num_cores <num_cores> --isONform_folder <isONform_folder> --iso_abundance <iso_abundance> --mode <mode>

(Please note that this requires isONclust LINK and isONcorrect LINK to be installed in addition to isONform)

To receive more information about the arguments used for the isON_pipeline script:

./isON_pipeline.sh --help

Outputs

IsONform outputs three main files: transcriptome.fasta, mapping.txt, and support.txt. For each isoform that isONform reconstructs the id has the following form: x_y_z.

'x' denotes the isONclust cluster that the isoform stems from. As we cluster reads as in isONcorrect in batches of 1000 reads the 'y' denotes from which batch the isoform was reconstructed. The 'z' denotes a unique identifier which enables us to have unique ids for each isoform that we reconstructed. In mapping.txt it is indicated from which original reads an isoform has been reconstructed. support_txt gives the support (i.e. how many original reads make up the isoform).

Contact

If you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via: alexander.petri[at]math.su.se

Credits

Please cite [1] when using isONform.

Petri, A. J., & Sahlin, K. (2023). isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics, 39(Supplement_1), i222-i231. https://academic.oup.com/bioinformatics/article/39/Supplement_1/i222/7210488 .

Please additionally cite [2] and [3] when running the full pipeline.

Kristoffer Sahlin, Paul Medvedev. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm, Journal of Computational Biology 2020, 27:4, 472-484. Link.
Sahlin, K., Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 12, 2 (2021). https://doi.org/10.1038/s41467-020-20340-8 Link.

Name		Name	Last commit message	Last commit date
Latest commit History 336 Commits
.github/workflows		.github/workflows
modules		modules
LICENSE		LICENSE
README.md		README.md
isON_pipeline.sh		isON_pipeline.sh
isONform_parallel		isONform_parallel
main		main
pipeline_no_pychop.sh		pipeline_no_pychop.sh
pipeline_simulations.sh		pipeline_simulations.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

isONform - Reference-free isoform reconstruction from long read sequencing data

Table of contents

Installation

Via pip

From github source

Introduction

Input data

Running isONform

Outputs

Contact

Credits

About

Releases 5

Packages

Contributors 4

Languages

License

aljpetri/isONform

Folders and files

Latest commit

History

Repository files navigation

isONform - Reference-free isoform reconstruction from long read sequencing data

Table of contents

Installation

Via pip

From github source

Introduction

Input data

Running isONform

Outputs

Contact

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 4

Languages

Packages