diff --git a/README.md b/README.md index 0a31333c..ea741176 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Built in Go for performance and simplicity, llama-swap has zero dependencies and - ✅ Easy to deploy and configure: one binary, one configuration file. no external dependencies - ✅ On-demand model switching -- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) +- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc.) - future proof, upgrade your inference servers at any time. - ✅ OpenAI API supported endpoints: - `v1/completions` @@ -61,7 +61,7 @@ llama-swap can be installed in multiple ways ### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap)) -Nightly container images with llama-swap and llama-server are built for multiple platforms (cuda, vulkan, intel, etc). +Nightly container images with llama-swap and llama-server are built for multiple platforms (cuda, vulkan, intel, etc.) including [non-root variants with improved security](docs/container-security.md). ```shell $ docker pull ghcr.io/mostlygeek/llama-swap:cuda @@ -89,6 +89,9 @@ docker pull ghcr.io/mostlygeek/llama-swap:musa # tagged llama-swap, platform and llama-server version images docker pull ghcr.io/mostlygeek/llama-swap:v166-cuda-b6795 +# non-root cuda +docker pull ghcr.io/mostlygeek/llama-swap:cuda-non-root + ``` diff --git a/docs/container-security.md b/docs/container-security.md new file mode 100644 index 00000000..d5584795 --- /dev/null +++ b/docs/container-security.md @@ -0,0 +1,9 @@ +## Container Security + +For convenience, the default container images use the **root** user within the container. This permits simplified access to host resources including volume mounts and hardware devices under `/dev/dri` (_for Vulkan support_). But this can widen the attack surface to privilege escalation exploits. + +Alternative images, tagged as `non-root`, are also available. For example, `llama-swap:cpu-non-root` uses the unprivileged **app** user by default. Depending on deployment requirements, additional configuration may be necessary to ensure that the container retains access to required hosts resources. This might entail customizing host filesystem permissions/ownership appropriately or injecting host group membership into the container. + +Docker offers a [system-wide option enabling user namespace remapping](https://docs.docker.com/engine/security/userns-remap/) to accommodate situations were a **root** container user is required but also mentions that _"The best way to prevent privilege-escalation attacks from within a container is to configure your container's applications to run as unprivileged users."_ Podman offers similar capability, per-container, to [set UID/GID mapping in a new user namespace](https://docs.podman.io/en/latest/markdown/podman-run.1.html#set-uid-gid-mapping-in-a-new-user-namespace). + +The Large Language Model (_LLM/AI_) ecosystem is rapidly evolving and [serious security vulnerabilities have surfaced in the past](https://huggingface.co/docs/hub/security-pickle). These alternative _non-root_ images could reduce the impact of future unknown problems. However, proper planning and configuration is recommended to utilize them.