Skip to content

Commit

Permalink
docs: document development setup (#12) (#14)
Browse files Browse the repository at this point in the history
  • Loading branch information
holtgrewe authored Jun 26, 2023
1 parent 34b37c4 commit 376e6c7
Show file tree
Hide file tree
Showing 7 changed files with 321 additions and 17 deletions.
2 changes: 1 addition & 1 deletion .env.ci
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# image_viguno_version=latest

# Name of the annonars image to use.
# image_annonars_name=annona-rs
# image_annonars_name=annonars

# Version of the annonars image to use.
# image_annonars_version=latest
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
## Demo
/.dev

## CI
/.ci/volumes
/.ci

## Docker Compose
# Configuration file.
Expand Down
185 changes: 184 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,192 @@

This repository contains the [Docker Compose](https://docs.docker.com/compose/) configuration for the [VarFish Server](https://github.com/bihealth/varfish-server).

## Development Setup

This section describes the steps needed for a development setup.

### Prerequites

You will need to fetch some of this from our S3 server.
We recommend the `s5cmd` tool as it is easy to install, use, and fast.
You can download it from [github.com/peak/s5cmd/releases](https://github.com/peak/s5cmd/releases).
For example:

```
$ wget -O /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz \
https://github.com/peak/s5cmd/releases/download/v2.1.0/s5cmd_2.1.0_Linux-64bit.tar.gz
$ tar -C /tmp -xf /tmp/s5cmd_2.1.0_Linux-64bit.tar.gz
$ sudo cp /tmp/s5cmd /usr/local/bin/
```

You will need to install Docker Compose.
Note that the "modern" way is to do this by using the docker compose plugin.
Instructions can be found [here on the Docker.com website](https://docs.docker.com/compose/install/linux/#install-using-the-repository).

### Checkout and Configure

First, clone the repository:

```
$ git clone [email protected]:bihealth/varfish-docker-compose-ng.git
```

From here on, the commands should be executed from within this repository (`cd varfish-docker-compose-ng`).

We will use the directory `.dev` within the checkout for storing data and secrets.
In a production deployment, these directories should live outside of the checkout, of course.

Now, we create the directories for data storage.

```
$ mkdir -p .dev/volumes/{minio,varfish-static}/data
```

Next, we setup some "secrets" for the passwords.

```
$ mkdir -p .dev/secrets
$ echo db-password >.dev/secrets/db-password
$ echo minio-root-password >.dev/secrets/minio-root-password
$ echo minio-varfish-password >.dev/secrets/minio-varfish-password
```

We now copy the `env.tpl` file to the default location for the environment `.env`.

```
$ cp env.tpl .env
```

Next, create a `docker-compose.override.yml` with the contents of the file `docker-compose.override.yml-dev`.
This will disable everything that we assume is running on your host when you are developing.
This includes the VarFish web server, redis, celery workers, postgres.

```
$ cp docker-compose.override.yml-dev docker-compose.override.yml
```

### Download Dev Data

Now you need to obtain the data to serve by the mehari, viguno, and annonars container.
For this, we have prepared strongly reduced data sets (overall less than 2GB rather than hundreds of GB of data).
Obtain the annonars data:

```
$ mkdir -p .dev/volumes/varfish-static/data/download
$ SRC_DST="
reduced-dev/annonars/*:annonars
reduced-dev/viguno/*:viguno
full/worker/genes-xlink-20230624/genes-xlink.tsv:genes-xlink-20230624
full/annonars/gnomad-mtdna-grch37-3.1+0.12.7/*:annonars/gnomad-mtdna-grch37-3.1+0.12.7
full/annonars/gnomad-mtdna-grch38-3.1+0.12.7/*:annonars/gnomad-mtdna-grch38-3.1+0.12.7
full/annonars/helixmtdb-grch37-20200327+0.12.7/*:annonars/helixmtdb-grch37-20200327+0.12.7
full/annonars/helixmtdb-grch38-20200327+0.12.7/*:annonars/helixmtdb-grch38-20200327+0.12.7
full/annonars/genes-3.1+2.1.1+4.4+20230624+0.7.0/*:annonars/genes-3.1+2.1.1+4.4+20230624+0.7.0
"
$ (set -x; for src_dst in $SRC_DST; do \
src=$(echo $src_dst | cut -d : -f 1); \
dst=$(echo $src_dst | cut -d : -f 2); \
mkdir -p .dev/volumes/varfish-static/data/download/$dst; \
s5cmd \
--endpoint-url=https://ceph-s3-ext.cubi.bihealth.org \
--no-sign-request \
sync \
"s3://varfish-public/$src" \
".dev/volumes/varfish-static/data/download/$dst"; \
done)
```

Setup symlink structure so the data is at the expected location.

```
$ ln -sr .dev/volumes/varfish-static/data/download/genes-xlink-20230624/genes-xlink.tsv \
.dev/volumes/varfish-static/data/hgnc_xlink.tsv
$ ln -sr .dev/volumes/varfish-static/data/download/viguno/hpo-20230606+0.1.6 \
.dev/volumes/varfish-static/data/hpo
$ mkdir -p .dev/volumes/varfish-static/data/annonars
$ ln -sr .dev/volumes/varfish-static/data/download/genes-xlink-20230624 \
.dev/volumes/varfish-static/data/annonars/genes
$ names="cadd dbsnp dbnsfp dbscsnv gnomad-mtdna gnomad-genomes gnomad-exomes helixmtdb cons"; \
for genome in grch37 grch38; do \
for name in $names; do \
mkdir -p .dev/volumes/varfish-static/data/annonars/$genome; \
test -e .dev/volumes/varfish-static/data/$genome/$name || \
ln -sr \
$(echo .dev/volumes/varfish-static/data/download/annonars/$name-$genome-* \
| tr ' ' '\n' \
| tail -n 1) \
.dev/volumes/varfish-static/data/annonars/$genome/$name; \
done; \
done
```

The next step is to obtain the data for Mehari

```
$ mkdir -p .dev/volumes/varfish-static/data/mehari/grch3{7,8}
$ wget -O .dev/volumes/varfish-static/data/mehari/grch37/txs.bin.zst \
https://github.com/bihealth/mehari-data-tx/releases/download/v0.2.2/mehari-data-txs-grch37-0.2.2.bin.zst
$ wget -O .dev/volumes/varfish-static/data/mehari/grch38/txs.bin.zst \
https://github.com/bihealth/mehari-data-tx/releases/download/v0.2.2/mehari-data-txs-grch38-0.2.2.bin.zst
```

### Startup and Check

Now, you can bring up the docker compose environment (stop with `Ctrl+C`).

```
$ docker compose up
```

To verify the results, have a look at the following URLs:

- Annonars database infos: http://0.0.0.0:3001/annos/db-info?genome-release=grch37
- Mehari impact prections: http://127.0.0.1:3002/tx/csq?genome-release=grch37&chromosome=17&position=48275363&reference=C&alternative=A
- Viguno for TGDS: http://127.0.0.1:3003/hpo/genes?gene_symbol=TGDS

You should also be able to access the MinIO console on:

- http://localhost:3011/login

The admin user is `minioadmin` and the password is stored in `.dev/secrets/minio-root-password`.

## Service Information

This section describes the services that are started with this Docker Compose.

### Trafik

[Traefik](https://traefik.io/traefik/) is a reverse proxy that is used as the main entry point for all services behind HTTP(S).
The software is well-documented by its creators.
However, it is central to the setup and for much of the additional setup, touching Trafik configuraiton is needed.
We thus summarize some important points here.

- Almost all configuration is done using labels on the `traefik` container itself or other containers.
- In the case of using configuration files, you will have to mount them from the host into the container.
- By default, we use "catch-all" configuration based on regular expressions on the host/domain name.

### Mehari

Mehari (by the VarFish authors) provides information about variants and their effect on individual transcripts.

### Viguno

Viguno (by the VarFish authors) provides HPO/OMIM related information.

### Annonars

Annonars (by the VarFish authors) provides variant annotation from public databases.

### Postgres

We use postgres for the database backend of VarFish.

### Redis

The Redis database is used for key-value store, e.g., for caching and the queues in the VarFish server.

### MinIO

[MinIO](https://min.io/) is an S3-compatible object storage server.
Expand Down Expand Up @@ -61,7 +243,8 @@ Added user `the-user` successfully.
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::the-bucket/*", "arn:aws:s3:::the-bucket"
"arn:aws:s3:::the-bucket/*",
"arn:aws:s3:::the-bucket"
],
"Sid": "BucketAccessForUser"
}
Expand Down
51 changes: 51 additions & 0 deletions docker-compose.override.yml-dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Docker Compose Override YAML fragment that can be used for development
#
# It will:
#
# - set the number of replicas to 0 for all containers where the equivalent
# will be run outside of docker or is not needed; this includes traefik,
# varfish-server, postgres, redis, ...
# - expose the containers that you need runing in docker at the following
# ports:
# - `3001` -- annonars
# - `3002` -- mehari
# - `3003` -- viguno
# - `3010` -- minio
# - `3011` -- minio console

services:
# map annonars to port 3001
annonars:
ports:
- "3001:8080"

# map mehari to port 3002
mehari:
ports:
- "3002:8080"

# map viguno to port 3003
viguno:
ports:
- "3003:8080"

# map Minio S3 to port 9000 and console to 9001
minio:
ports:
- "3010:9000"
- "3011:9001"

# disable traefik
traefik:
deploy:
replicas: 0

# disable postgres
postgres:
deploy:
replicas: 0

# disable redis
redis:
deploy:
replicas: 0
26 changes: 13 additions & 13 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ x-service-default: &service_default
x-service-varfish-default: &service_varfish_default
volumes:
- type: bind
source: ${volumes_basedir:-.}/varfish-static/data
source: ${volumes_basedir:-./.dev/volumes}/varfish-static/data
target: /data
read_only: true

Expand Down Expand Up @@ -67,7 +67,7 @@ services:
container_name: mehari
hostname: mehari
image: "${image_base:-ghcr.io/bihealth}/${image_mehari_name:-mehari}:\
${image_mehari_version:-0.5}"
${image_mehari_version:-latest}"

# -- Viguno ----------------------------------------------------------------
#
Expand All @@ -79,7 +79,7 @@ services:
container_name: viguno
hostname: viguno
image: "${image_base:-ghcr.io/bihealth}/${image_viguno_name:-viguno}:\
${image_viguno_version:-0.1}"
${image_viguno_version:-latest}"

# -- Annonars ---------------------------------------------------------------
#
Expand All @@ -89,8 +89,8 @@ services:
<<: *service_varfish_default
container_name: annonars
hostname: annonars
image: "${image_base:-ghcr.io/bihealth}/${image_annonars_name:-annona-rs}:\
${image_annonars_version:-0.12.4}"
image: "${image_base:-ghcr.io/bihealth}/${image_annonars_name:-annonars}:\
${image_annonars_version:-latest}"

# -- PostgreSQL Server -----------------------------------------------------
#
Expand All @@ -110,7 +110,7 @@ services:
- db-password
volumes:
- type: bind
source: ${volumes_basedir:-.}/postgres/data
source: ${volumes_basedir:-./.dev/volumes}/postgres/data
target: /var/lib/postgresql/data

# -- Redis -----------------------------------------------------------------
Expand All @@ -125,7 +125,7 @@ services:
image: ${image_redis_name:-redis}:${image_redis_version:-6}
volumes:
- type: bind
source: ${volumes_basedir:-.}/redis/data
source: ${volumes_basedir:-./.dev/volumes}/redis/data
target: /data

# -- Minio (Server) --------------------------------------------------------
Expand Down Expand Up @@ -156,16 +156,16 @@ services:
- minio-root-password
# Uncomment the following two lines (reminder: "HOST:CONTAINER") to
# enable access to the console from the host.
ports:
- "9001:9001"
# ports:
# - "9001:9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
volumes:
- type: bind
source: ${volumes_basedir:-.}/minio/data
source: ${volumes_basedir:-./.dev/volumes}/minio/data
target: /data

# -- Minio (Client) --------------------------------------------------------
Expand Down Expand Up @@ -199,13 +199,13 @@ services:
secrets:
# The PostgreSQL database password.
db-password:
file: ${secrets_basedir:-./secrets}/db-password
file: ${secrets_basedir:-./.dev/secrets}/db-password
# The secrets for the root (=minioadmin) user on the MinIO server.
minio-root-password:
file: ${secrets_basedir:-./secrets}/minio-root-password
file: ${secrets_basedir:-./.dev/secrets}/minio-root-password
# The secrets for the varfish user on the MinIO server.
minio-varfish-password:
file: ${secrets_basedir:-./secrets}/minio-varfish-password
file: ${secrets_basedir:-./.dev/secrets}/minio-varfish-password


# == Networks ================================================================
Expand Down
Loading

0 comments on commit 376e6c7

Please sign in to comment.