QC, spin VMs, load data into DBs, create LUTs and other fun backend stuff we need to do to spin https://genetics.opentargets.io
There is a docker file to create an image with the project and required dependencies in place.
Build image and tag it with a name for convenience of calling later:
docker build --tag otg-etl .
Start a docker container in interactive mode.
Host names must not contain protocol (https
is assumed) or slashes. The data loading script uses localhost
if no host name provided.
docker run -it --rm \
--env ES_HOST='<elasticsearch host name>' \
--env CLICKHOUSE_HOST='<ot clickhouse db host name>' \
otg-etl
Authenticate google cloud storage.
gcloud auth application-default login
Load release data to ot
database and the Elasticserch.
bash genetics-backend/loaders/clickhouse/create_and_load_everything_from_scratch.sh gs://genetics-portal-output/190504
Start a docker container in interactive mode.
Host names must not contain protocol (https
is assumed) or slashes. The data loading script uses localhost
if no host name provided.
docker run -it --rm \
--env ES_HOST='<elasticsearch host name>' \
--env CLICKHOUSE_HOST='<ot clickhouse db host name>' \
-v <directory with data>:/data/
otg-etl
Load release data to ot
database and the Elasticserch.
bash genetics-backend/loaders/clickhouse/create_and_load_everything_from_scratch.sh /data
You can use wget to download the release data. Below is an example of the command for 19.05.04
release data.
wget --mirror ftp://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/190504/