nanoSweet

Demultiplex your nanopore (or other) reads! nanomux fuzzy matching useful for noisy reads. Written entirely in C, so should compile on most systems. The repo contains a windows branch that uses something else than pthreads for threading.

In the repo, you can also find nanotrim – a small threaded program which you can use to filter out reads with a mean quality and between length between min and max.

The repo also contains nanodup – a small threaded program to deduplicate all the reads and saving information about duplication status.

Quick Start

$ git clone https://github.com/willros/nanomux_c.git
$ cd nanomux_c/
$ cc -o nob nob.c
$ ./nob

test nanomux

$ ./nanomux -b tests/bc_test.csv -f tests/test.fastq -r 600 -R 2000 -p 200 -k 1 -o new_nanomux -t trim -s split -j 4

test nanotrim

$ ./nanotrim -i tests/test.fastq -r 2000 -R 10000 -q 20 -t 4 -o test_nanotrim

test nanodup

$ ./nanodup -i tests/test.fastq -o test_nanodup

nanomux

Run ./nanomux to get the help message:

[USAGE]: nanomux 
   -b <barcode>             Path to barcode file.
   -f <fastq>               Path to fastq file.
   -r <read_len_min>        Minimum length of read.
   -R <read_len_max>        Maximum length of read.
   -p <barcode_position>    Position of barcode.
   -k <mismatches>          Number of misatches allowed.
   -o <output>              Name of output folder.
   -t <trim_option>         Trim reads from adapters or not?                      [Options]: trim | notrim.
   -s <split_option>        Split concatenated reads based on adapter occurance?. [Options]: split | nosplit.
   -j <threads>             Number of threads to use.                             Default: 1

barcode_pos: where to search for the barcode in the ends. if barcode_pos == 200 and the read length is 1000, the barcodes will be searched for from position 0 -> 200 and 800 -> 1000.
k: allowed number of mismatches
trim_option: To trim the reads to the left and right of the found barcode.
split_option: Try to split reads longer than 2500 if an adapter sequence is found in the middle. One splitted read results in two new reads, with the suffix _1 and _2.

barcode_file.csv MUST have the follwing shape:

Dual barcodes:

For cases where both forward and reverse barcodes are used, the CSV file should have the following columns:

name: The identifier for the barcode.
forward barcode: The sequence of the forward barcode.
reverse barcode: The sequence of the reverse barcode.

Example:

bc1,ATACGATGCTA,GTCGATGTCTGA
bc2,GACACACAC,GTCGATTGATG

Single barcodes

If the user wants to provide only a single barcode per entry, the file should contain only two columns:

name: The identifier for the barcode.
forward barcode: The sequence of the barcode.

Example:

bc1,ATACGATGCTA
bc2,GACACACAC

If a file with two barcodes is provided, the program will search for the barcodes using both the forward and reverse sequences. If only one barcode is provided, it will only search for matches at the 5’ (start) or 3’ (end) positions in the reads.

Barcode Matching Process

Dual-Barcoded Search: The program compares the first part of the read with the forward barcode and the last part with the reverse barcode's reverse complement. If both match, the read is stored.
Single-Barcoded Search: The program compares the first part of the read with the barcode and the last part with its reverse complement. If either matches, the read is stored.

nanotrim

Run ./nanotrim to get the help message:

[USAGE]: nanotrim -i <input> [options]
"[USAGE]: nanotrim -i <input> [options]\n"
"   -i    <input>             Path of folder or file\n"
"   -o    <output>            Name of output folder.\n"
"   -r    <read_length_min>   Minium length of read.    Optional: Default 1\n"
"   -R    <read_length_max>   Minium length of read.    Optional: Default INT_MAX\n"
"   -q    <quality>           Minimum quality of read.  Optional: Default 1\n"
"   -t    <threads>           Number of threads to use. Optional: Default 1\n";

nanotrim saves the trimmed reads to a fastq in the specified output folder. Uses same name as the original file, but with the suffix .filtered. The input can be a single file or an entire folder – nanotrim knows can distinguish between the two.

Example of output:

$ ./nanotrim -i tests/test.fastq -r 2000 -R 10000 -q 20 -t 4
$ tests/test.fastq: 4788 raw reads (29 passed) --> To short: 4169  | To long: 3     | Low quality: 587

nanotrim also produces a log file with the above information in a .csv format, saved in the output path.

file,raw_reads,passed_reads,short,long,bad_quality
tests/test.fastq,4788,29,4169,3,587

nanodup

Run ./nanodup to get the help message:

[ERROR] You must provide the input path
[ERROR] [USAGE]: nanodup -i <input> -o <output> [options]
   -i    <input>             Path of folder or file
   -o    <output>            Name of output folder.
   -t    <threads>           Number of threads to use. Optional: Default 1

nanodup keeps removes all the duplicated reads, i.e. identical reads. It only saves the first read if the read is duplicated. nanodup also produces a log file with the information about the reads and saves all the de-duplicated reads to a new fastq file.

Example of output:

$ ./nanodup -i tests/test.fastq -o test_nanodup
$ [INFO] tests/test.fastq contained: 0 duplicates

Citation

If you use nanoSweet, please go here to find how to cite: nanoSweet

Credit

nanomux_c uses kseq.h for fastq parsing, and nob.h, written by @tsoding, for overall useful functions!
It also uses thpool.h by Johan Hanssen Seferidis.

TODO

trim barcodes
multi threading
read splitting
Change to threadpool in nanomux
Add logging to nanotrim
[] Fix the windows branch with gzip append and nanodup

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
tests		tests
README.md		README.md
kseq.h		kseq.h
nanodup.c		nanodup.c
nanomux.c		nanomux.c
nanotrim.c		nanotrim.c
nob.c		nob.c
nob.h		nob.h
thpool.c		thpool.c
thpool.h		thpool.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanoSweet

Quick Start

test nanomux

test nanotrim

test nanodup

nanomux

Dual barcodes:

Single barcodes

Barcode Matching Process

nanotrim

nanodup

Citation

Credit

TODO

About

Releases

Packages

Languages

willros/nanoSweet

Folders and files

Latest commit

History

Repository files navigation

nanoSweet

Quick Start

test nanomux

test nanotrim

test nanodup

nanomux

Dual barcodes:

Single barcodes

Barcode Matching Process

nanotrim

nanodup

Citation

Credit

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages