Skip to content

Latest commit

 

History

History

inference-service

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Inference service template

Description

This container deploys a model as an API endpoing using the FastAPI framework and Docker. Once run, go to http://0.0.0.0:8000/docs to check out the auto-generated API docs.

Usage

Run the job by executing this command with your chosen configuration, exposing port 5000 to the outside world:

docker run -d -p 80:80 inference-service-template

To enable the Nvidia Container Runtime for GPU-acceleration, execute:

docker run --runtime=nvidia inference-service-template

Execute this command for interactive run mode:

docker run -it --entrypoint=/bin/bash inference-service-template

More options:

  • publish the container's port 80 via to host_port using -p {host_port}:80
  • for testing purposes (fast execution), run with --env-file==test.env.
  • run in detached mode usingn -d

Environment variables

The training job can be parametrized with environment variables. These can be defined by passing an environment file via --env-file=={env_file} to docker run. To learn about available variables, click here.

Configuration

Parameters

The training job can be configured with following environment variables:

Variable Description Default
Training Config:
N_EPOCHS no. training epochs 10
SEED Global seed used for random numbers. 42
Computation Config:
MAX_NUM_THREADS Controls which GPUs CUDA applications will see. By default, all GPUs from the host are used. You can either use all, none, or specify a comma-separated list of device IDs (e.g. 0,1). You can find out the list of available device IDs by running nvidia-smi on the host machine. auto
NVIDIA_VISIBLE_DEVICES (GPU only) Controls which GPUs will be accessible by the job. By default, all GPUs from the host are used. You can either use all, none, or specify a comma-separated list of device IDs (e.g. 0,1). You can find out the list of available device IDs by running nvidia-smi on the host machine. all
CUDA_VISIBLE_DEVICES (GPU only) Controls which GPUs will be accessible by the job. By default, all GPUs that the job has access to will be visible. To restrict the job, provide a comma-separated list of internal device IDs (e.g. 0,2) based on the available devices within the container job (run nvidia-smi). In comparison to NVIDIA_VISIBLE_DEVICES, the job user will still able to access other GPUs by overwriting this configuration from within the container. all
Cloud Config:

Proxy

If a proxy is required, you can pass it via the http_proxyand no_proxy environment varibales. For example: --env http_proxy=<server>:<port>

Docker Configuration

You can find more ways of configuration about docker run and docker service create in the official Docker documentation.

Develop

Requirements

  • Python 3, Maven, Docker

Build

Execute this command in the project root folder to build the docker container:

python build.py --version={MAJOR.MINOR.PATCH-TAG}

The version has to be provided. The version format should follow the Semantic Versioning standard (MAJOR.MINOR.PATCH). For additional script options:

python build.py --help

Deploy

Execute this command in the project root folder to push the container to the configured docker registry:

python build.py --deploy --version={MAJOR.MINOR.PATCH-TAG}

The version has to be provided. The version format should follow the Semantic Versioning standard (MAJOR.MINOR.PATCH). For additional script options:

python build.py --help

Configure Docker Repository

In order to pull and push docker containers, a Docker registry needs to be configured:

docker login <server>

and your user and password to login.