Skip to content

2.2.16 Backend: KoboldCpp

av edited this page Jan 12, 2025 · 6 revisions

Handle: kobold
URL: http://localhost:34311

KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable from Concedo, that builds off llama.cpp, and adds a versatile KoboldAI API endpoint, additional format support, Stable Diffusion image generation, speech-to-text, backward compatibility, as well as a fancy UI with persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything KoboldAI and KoboldAI Lite have to offer.

Starting

# [Optional] pre-pull the image
harbor pull kobold

# Start the service
harbor up kobold

# [Optional] inspect service logs
harbor logs kobold

# [Optional] Open KoboldAI Lite WebUI
harbor open kobold

By default, Harbor's kobold instance is pre-configured in the same way as the official Docker Compose example, so it'll take a bit to start on the first run.

Models

kobold functions similarly to llamacpp in terms of model management. You can find GGUF models to run on Huggingface here. After you find a model you want to run, grab the URL from the browser address bar and pass it to the harbor config

# Quick lookup for the models
harbor hf find gguf

Tip

Don't forget to switch to the correct chat template in the KoboldCpp UI when switching models

With Built-in downloader

# Set the model to a full GGUF URL from huggingface
harbor kobold model https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf

# Another example
harbor kobold model https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/blob/main/Llama-3.2-1B-Instruct-Q4_0.gguf

Note that the models are downloaded and placed in the kobold workspace in Harbor's home directory, you can see the path with:

# Show the workspace path
harbor config get kobold.worspace

# The path must be either relative to $(harbor home) or global
harbor config set kobold.workspace /path/to/your/workspace

The models will always be downloaded as model.gguf in the workspace directory, changing the URL will overwrite the existing model.

With the original downloader

koboldcpp binary also have a built-in downloader based on curl, when a full URL is passed to the --model arg. To use this feature, you'll need to reset the model specifier and configure it via args instead.

# Reset the model specifier
harbor kobold model ""

# See existing args
harbor kobold args
harbor config get kobold.args

# Set the model to a full GGUF URL from huggingface
# for llamacpp to download
harbor kobold args --model https://huggingface.co/bartowski/Dolphin3.0-Llama3.1-8B-GGUF/blob/main/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf

Similarly to the built-in downloader, the models will be saved in the kobold workspace by default.

ls -la $(harbor home)/kobold/data

With Already downloaded models

In order to use already downloaded models, you'll need to reset the model config and configure it via args instead.

# Reset the model specifier
harbor kobold model ""

# See existing args
harbor kobold args
harbor config get kobold.args

# Example output:
# --model model.gguf

# You will want to change "--model" argument to a location of already downloaded model

HuggingFace Hub cache

# [Optional] See what you already have downloaded
harbor hf scan-cache

# [Optional] Download desired model with the official HF CLI
harbor hf download bartowski/Llama-3.2-1B-Instruct-GGUF Llama-3.2-1B-Instruct-Q4_0.gguf

# Locate the specific file in the cache
harbor find Llama-3.2-1B-Instruct-Q4_0.gguf
# Example output:
# /home/user/.cache/huggingface/hub/models--bartowski--Llama-3.2-1B-Instruct-GGUF/snapshots/067b946cf014b7c697f3654f621d577a3e3afd1c/Llama-3.2-1B-Instruct-Q4_0.gguf

# The HF Cache will be mounted to `kobold` container at "/hf" location, similarly to `llamacpp`
harbor kobold args --model /hf/hub/models--bartowski--Llama-3.2-1B-Instruct-GGUF/snapshots/067b946cf014b7c697f3654f621d577a3e3afd1c/Llama-3.2-1B-Instruct-Q4_0.gguf

From other caches

Harbor will mount other inference backends like llamacpp or vllm to the root of the container, so you can use them as well.

  • llamacpp - /llamacpp
  • ollama - /ollama
  • vllm - /vllm

For example:

# Locate the file in the `llamacpp` cache
harbor find Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

# Example output:
# /home/user/.cache/llama.cpp/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

# Set kobold model
harbor kobold args --model /llamacpp/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

Configuration

Tip

Consider using Harbor Profiles or Harbor App to simplify the configuration process.

The image used by kobold will download the actual binary on the first run, after which you can run it with:

# Show the help for original binary
harbor run kobold /workspace/koboldcpp --help

Additionally, the docker entrypoint supports multiple extra environment variables, which can be set via harbor env:

# See the outline of supported env variables
harbor run kobold cat docker-helper.sh | grep KCPP_

# Set the env variable
harbor env kobold KCPP_DONT_REMOVE_MODELS true

Other than that, the following options can be set via harbor config

# See all supported options
harbor config ls | grep KOBOLD

# The port on the host machine to access
# the kobold service when it is running
KOBOLD_HOST_PORT               34311

# The Docker image to use for kobold service
KOBOLD_IMAGE                   koboldai/koboldcpp

# The tag of the Docker image to use for kobold service
KOBOLD_VERSION                 latest

# Location of the workspace directory on the host machine,
KOBOLD_WORKSPACE               ./kobold/data

# The model to use for kobold service
KOBOLD_MODEL                   https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf?download=true

# The arguments to pass to the koboldcpp binary
KOBOLD_ARGS                    --model model.gguf

API Endpoints

kobold exposes both WebUI and API endpoints which can be accessed via the browser or any HTTP client. By default, Harbor will connect kobold's OpenAI-compatible API to Open WebUI when both are running.

# Start both services
# Out of the box, `webui` is optional here
# as it's one of the default services
harbor up webui kobold
Clone this wiki locally