A user-friendly web server for fungal ITS amplicon sequencing data
A running instance of DAnIEL: Describing, Analyzing and Integrating fungal Ecology to effectively study the systems of Life can be accessed at https://sbi.hki-jena.de/daniel.
- Analysis of paired-end ITS amplicon sequencing data in any web browser
- Statistics and machine learning between groups of samples
- Correlation networks
- Integration of existing cohorts from NCBI SRA
DAnIEL: a user-friendly web server for fungal ITS amplicon sequencing data Daniel Loos, Lu Zhang, Christine Beemelmanns, Oliver Kurzai, and Gianni Panagiotou Frontiers in Microbiology, doi: 10.3389/fmicb.2021.720513
- Set up environmental variables to paths of this repository and the data base directory (not included):
export DANIEL_DIR=/my-path
export DANIEL_REPO_DIR=$DANIEL_DIR/repo
export DANIEL_USERDAT_DIR=$DANIEL_DIR/userdat
export DANIEL_DB_DIR=$DANIEL_DIR/db
- Clone the repository to build the web server from source:
git clone https://github.com/bioinformatics-leibniz-hki/DAnIEL.git $DANIEL_REPO_DIR
cd $DANIEL_DIR
git lfs pull
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build --parallel
Alternatively, download the docker images from Docker Hub:
docker pull bioinformaticsleibnizhki/daniel_backend
docker pull bioinformaticsleibnizhki/daniel_frontend
The latter method requires to change the image names in the file docker-compose.yml
to use those pulled from Docker Hub (e.g. replace ìmage: daniel_backend
with image: bioinformaticsleibnizhki/daniel_backend:v1.0
).
-
Create msmtp config file in
back_end/msmtprc
for email notifications. This file must be mounted to the back end container at/etc/msmtprc
at runtime.notify_mail
function inworker.sh
will use this to send emails to the users once the pipeline is finished. -
Deploy a data base directory at
$DANIEL_DB_DIR
e.g. by unzipping the database DAnIEL DB v 1.0:
wget https://zenodo.org/record/4073125/files/daniel_db_v1.0.tar.gz?download=1 \
-O daniel_db_v1.0.tar.gz
tar -xf daniel_db_v1.0.tar.gz
mv db $DANIEL_DB_DIR
Use docker-compose to start DAnIEL webserver containers:
wget https://raw.githubusercontent.com/bioinformatics-leibniz-hki/DAnIEL/main/docker-compose.yml
docker-compose up -d
The front end can be accessed at http://localhost.
The software package is divided in the following sections:
front_end
- Interactive website to upload data and to visualize the resultsback_end
- The analysis workflow called from the front enddanielLib
- R package containing functions both front end and back end require
The aim of the front end is to create a directory for each project containing all files needed to start the analysis workflow.
It is written in R Shiny.
Input of reactive UI elements are merged into a file project.json
.
A queue file in the user directory is appended by the project id when the start pipeline button is clicked.
The aim of the back end is to process the analysis workflow.
It is written in Snakemake.
HTML reports are generated using R Markdown.
They are stored in the directory back_end/reports
.
Helper scripts e.g. to create bibtex files and the sqlite data base are stored at back_end/helper
.
New features can be added by creating a new Snakemake rule in the directory back_end/rules
and adding the result file as a target to the file targets.snakefile.py
.
A Conda environment can be defined for each rule in the directory back_end/envs
.
Visualization is done by creating a new shiny module in the directory front_end/modules
and adding it to the app files front_end/server.R
and front_end/ui.R
.
General tools used:
- docker - app containerization
- conda - management of environments and software packages
- R shiny - Front end UI
- tidyverse - Data manipulation and visualization
- Snakemake - management of back end workflow
Tools used to perform the bioinformatical analysis:
- Cutadapt - quality control of raw reads
- FastQC - asses quality of read files
- MultiQC - merging QC results from multiple samples
- PIPITS - OTU profiling pipeline
- DADA2 - ASV profiling pipeline
- FastSpar - Correlation analysis aware of sparsity
- BAnOCC - Correlation analysis aware of compositionality
- caret - Machine learning
- vegan - Diversity analysis
Daniel Loos
Systems Biology and Bioinformatics
Leibniz Institute for Natural Product Research and Infection Biology
Hans Knöll Institute (HKI)