Skip to content

A universal HMM-based peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets of different quality.

License

Notifications You must be signed in to change notification settings

JetBrains-Research/omnipeak

Repository files navigation

JetBrains Research license tests DOI

OmniPeak

OmniPeak is a universal HMM-based peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets of different quality.

Features

  • Supports both narrow and broad footprint experiments (ChIP-seq, ATAC-seq, DNAse-seq)
  • Supports BAM, SAM, CRAM, BED, BigWig input formats
  • Produces robust results on datasets of different signal-to-noise ratio, including Ultra-Low-Input ChIP-seq
  • Produces highly consistent results in multiple-replicates experiment setup
  • Tolerates missing control experiment
  • Integrated into the JetBrains Research ChIP-seq analysis pipeline from raw reads to visualization and peak calling
  • Integrated with the JBR Genome Browser, uploaded data model allows for interactive visualization and fine-tuning
  • Experimentally supports multi-replicated mode and differential peak calling mode

Requirements

Download and install Java 21+.

Peak calling

To analyze a single (possibly replicated) biological condition use analyze command. See details with command:

$ java --add-modules=jdk.incubator.vector  -Xmx8G  -jar omnipeak.jar analyze --help

The <output.bed> file will contain predicted and FDR-controlled peaks in the ENCODE broadPeak (BED 6+3) format:

<chromosome> <peak start offset> <peak end offset> <peak_name> <score> . <coverage or fold/change> <-log p-value> <-log Q-value>

Examples on Java 21:

  • Regular peak calling
    java --add-modules=jdk.incubator.vector -Xmx8G -jar omnipeak.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak
  • Model fitting only
    java --add-modules=jdk.incubator.vector -Xmx8G -jar omnipeak.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -m Model.op

Differential peak calling

Experimental! To compare two (possibly replicated) biological conditions use the compare. See help for details:

$ java --add-modules=jdk.incubator.vector -Xmx8G -jar omnipeak.jar compare --help

Command line options

Parameter Description
-t, --treatment TREATMENT
required
Treatment file. Supported formats: BAM, BED, or BED.gz file.
If multiple files are provided, they are treated as replicates.
Multiple files should be separated by commas: -t A,B,C.
Multiple files are processed as replicates on the model level.
-c, --control CONTROL Control file. Multiple files should be separated by commas.
A single control file, or a separate file per each treatment file is required.
Follow the instructions for -t, --treatment.
-cs, --chrom.sizes CHR_SIZES
required
Chromosome sizes file for the genome build used in TREATMENT and CONTROL files.
Can be downloaded at UCSC.
-b, --bin BIN_SIZE Peak analysis is performed on read coverage tiled into consequent bins of configurable size.
-f, --fdr FDR False Discovery Rate cutoff to call significant regions.
-p, --peaks PEAKS Resulting peaks file in ENCODE broadPeak* (BED 6+3) format.
If omitted, only the model fitting step is performed.
-chr, --chromosomes CHROMOSOMES Chromosomes to process, multiple chromosomes should be separated by commas.
-fmt, --format FORMAT Reads file format. Supported: BAM, SAM, CRAM, BED. Text format can be in zip or gzip archive.
If not provided, guessed from file extensions.
-fr, --fragment FRAGMENT Fragment size. If provided, reads are shifted appropriately.
If not provided, the shift is estimated from the data.
--fragment 0 is recommended for ATAC-Seq data processing.
-kd, --keep-duplicates Keep duplicates. By default, OmniPeak filters out redundant reads aligned at the same genomic position.
Recommended for bulk single cell ATAC-Seq data processing.
-bl, --blacklist BLACKLIST_BED Blacklisted regions of the genome to be excluded from peak calling results.
-m, --model MODEL This option is used to specify OmniPeak model path.
-w, --workdir PATH Path to the working directory. Used to save coverage and model cache.
-sm, --summits Calls summits within peaks.
Recommended for ATAC-seq and single-cell ATAC-seq analysis.
-l, --log LOG Path to log file, if not provided, it will be created in working directory.
-d, --debug Print debug information, useful for troubleshooting.
-q, --quiet Turn off standard output.
-thr, --threads THREADS Configure the parallelism level.
-bw, --bigwig BIGWIG_PATH Create beta-control corrected counts per million normalized track.
--iterations ITS Maximum number of iterations for Expectation Maximisation (EM) algorithm.
--threshold THR Convergence threshold for EM algorithm, use --debug option to see detailed info.
--hmm-snr SNR Fraction of coverage to estimate and guard signal to noise ratio, 0 to disable constraint check.
--hmm-low LOW Minimal low state mean threshold, guards against too broad peaks, 0 to disable constraint check.
--sensitivity SENSITIVITY Configures log PEP threshold sensitivity for candidates selection.
Automatically estimated from the data, or during semi-supervised peak calling.
--gap GAP Configures minimal gap between peaks.
Generally, not required, but used in semi-supervised peak calling.
--multiple TEST Method applied for multiple hypothesis testing.
BH for Benjamini-Hochberg, BF for Bonferroni.
--fragmentation FRAGMENTATION Fragmentation threshold in bp to apply compensation gap.
Fragmentation indicates how much less peaks we could obtain by increasing gap.
Not available when gap is explicitly provided.
--clip CLIP_TRESHOLD Clip max threshold for fine-tune boundaries according to local signal, 0 to disable.
--ext Save extended states information to model file.
Required for model visualization in JBR Genome Browser.
--deep-analysis Deep analysis of model including analysis of coverage / candidates / peaks.
--keep-cache Keep cache files. By default OmniPeak creates cache files in working directory and cleans up.

Build from sources

Clone bioinf-commons library under the project root.

git clone [email protected]:JetBrains-Research/bioinf-commons.git

Launch the following command line to build OmniPeak jar:

./gradlew shadowJar

The OmniPeak jar file will be generated in the folder build/libs.

Errors Reporting

Use GitHub issues to suggest new features or report bugs.

Authors

JetBrains Research BioLabs

About

A universal HMM-based peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets of different quality.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages