-
-
Notifications
You must be signed in to change notification settings - Fork 76
2.2.16 Backend: KoboldCpp
Handle:
kobold
URL: http://localhost:34311
KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable from Concedo, that builds off llama.cpp, and adds a versatile KoboldAI API endpoint, additional format support, Stable Diffusion image generation, speech-to-text, backward compatibility, as well as a fancy UI with persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything KoboldAI and KoboldAI Lite have to offer.
# [Optional] pre-pull the image
harbor pull kobold
# Start the service
harbor up kobold
# [Optional] inspect service logs
harbor logs kobold
# [Optional] Open KoboldAI Lite WebUI
harbor open kobold
By default, Harbor's kobold
instance is pre-configured in the same way as the official Docker Compose example, so it'll take a bit to start on the first run.
kobold
functions similarly to llamacpp
in terms of model management. You can find GGUF models to run on Huggingface here. After you find a model you want to run, grab the URL from the browser address bar and pass it to the harbor config
# Quick lookup for the models
harbor hf find gguf
Tip
Don't forget to switch to the correct chat template in the KoboldCpp UI when switching models
# Set the model to a full GGUF URL from huggingface
harbor kobold model https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf
# Another example
harbor kobold model https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/blob/main/Llama-3.2-1B-Instruct-Q4_0.gguf
Note that the models are downloaded and placed in the kobold
workspace in Harbor's home directory, you can see the path with:
# Show the workspace path
harbor config get kobold.worspace
# The path must be either relative to $(harbor home) or global
harbor config set kobold.workspace /path/to/your/workspace
The models will always be downloaded as model.gguf
in the workspace directory, changing the URL will overwrite the existing model.
koboldcpp
binary also have a built-in downloader based on curl
, when a full URL is passed to the --model
arg. To use this feature, you'll need to reset the model specifier and configure it via args
instead.
# Reset the model specifier
harbor kobold model ""
# See existing args
harbor kobold args
harbor config get kobold.args
# Set the model to a full GGUF URL from huggingface
# for llamacpp to download
harbor kobold args --model https://huggingface.co/bartowski/Dolphin3.0-Llama3.1-8B-GGUF/blob/main/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf
Similarly to the built-in downloader, the models will be saved in the kobold
workspace by default.
ls -la $(harbor home)/kobold/data
In order to use already downloaded models, you'll need to reset the model
config and configure it via args
instead.
# Reset the model specifier
harbor kobold model ""
# See existing args
harbor kobold args
harbor config get kobold.args
# Example output:
# --model model.gguf
# You will want to change "--model" argument to a location of already downloaded model
# [Optional] See what you already have downloaded
harbor hf scan-cache
# [Optional] Download desired model with the official HF CLI
harbor hf download bartowski/Llama-3.2-1B-Instruct-GGUF Llama-3.2-1B-Instruct-Q4_0.gguf
# Locate the specific file in the cache
harbor find Llama-3.2-1B-Instruct-Q4_0.gguf
# Example output:
# /home/user/.cache/huggingface/hub/models--bartowski--Llama-3.2-1B-Instruct-GGUF/snapshots/067b946cf014b7c697f3654f621d577a3e3afd1c/Llama-3.2-1B-Instruct-Q4_0.gguf
# The HF Cache will be mounted to `kobold` container at "/hf" location, similarly to `llamacpp`
harbor kobold args --model /hf/hub/models--bartowski--Llama-3.2-1B-Instruct-GGUF/snapshots/067b946cf014b7c697f3654f621d577a3e3afd1c/Llama-3.2-1B-Instruct-Q4_0.gguf
Harbor will mount other inference backends like llamacpp
or vllm
to the root of the container, so you can use them as well.
-
llamacpp
-/llamacpp
-
ollama
-/ollama
-
vllm
-/vllm
For example:
# Locate the file in the `llamacpp` cache
harbor find Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
# Example output:
# /home/user/.cache/llama.cpp/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
# Set kobold model
harbor kobold args --model /llamacpp/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Tip
Consider using Harbor Profiles or Harbor App to simplify the configuration process.
The image used by kobold
will download the actual binary on the first run, after which you can run it with:
# Show the help for original binary
harbor run kobold /workspace/koboldcpp --help
Additionally, the docker entrypoint supports multiple extra environment variables, which can be set via harbor env
:
# See the outline of supported env variables
harbor run kobold cat docker-helper.sh | grep KCPP_
# Set the env variable
harbor env kobold KCPP_DONT_REMOVE_MODELS true
Other than that, the following options can be set via harbor config
# See all supported options
harbor config ls | grep KOBOLD
# The port on the host machine to access
# the kobold service when it is running
KOBOLD_HOST_PORT 34311
# The Docker image to use for kobold service
KOBOLD_IMAGE koboldai/koboldcpp
# The tag of the Docker image to use for kobold service
KOBOLD_VERSION latest
# Location of the workspace directory on the host machine,
KOBOLD_WORKSPACE ./kobold/data
# The model to use for kobold service
KOBOLD_MODEL https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf?download=true
# The arguments to pass to the koboldcpp binary
KOBOLD_ARGS --model model.gguf
kobold
exposes both WebUI and API endpoints which can be accessed via the browser or any HTTP client. By default, Harbor will connect kobold
's OpenAI-compatible API to Open WebUI when both are running.
# Start both services
# Out of the box, `webui` is optional here
# as it's one of the default services
harbor up webui kobold