Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1c15fd2
refactor: kv cache manager repo name
sagearc Jan 19, 2026
2683a46
Merge branch 'main' into rename-llmd-kv-cache
sagearc Jan 19, 2026
2facce6
go mod tidy
sagearc Jan 19, 2026
5755ebd
fetch kv cache upstream instead of my fork
sagearc Jan 19, 2026
824c89b
revert dockerfile to fetch kv cache manager from upstream instead of …
sagearc Jan 19, 2026
58f6a4b
update chat preprocessing structs
sagearc Jan 19, 2026
b89abce
update kv cache manager version
sagearc Jan 20, 2026
a26d4b1
refactor kvblock.Key to kvblock.BlockHash
sagearc Jan 20, 2026
2e43da1
add context
sagearc Jan 20, 2026
4e44531
add parent block key
sagearc Jan 20, 2026
3a9a6d9
refactor encode
sagearc Jan 20, 2026
9c820cf
validate model name
sagearc Jan 20, 2026
4edcd32
run setup.sh
hyeongyun0916 Jan 20, 2026
2a5a77e
clone vllm into build
sagearc Jan 20, 2026
b63ef84
edit
hyeongyun0916 Jan 20, 2026
31eefb9
edit lint
hyeongyun0916 Jan 20, 2026
9c943e7
Merge commit 'f272b8549a9d870357aa9900ed972836136bc019' into rename-l…
hyeongyun0916 Jan 20, 2026
a50eb9d
delete fetch-python-wrapper.sh
hyeongyun0916 Jan 20, 2026
b9fd43d
edit git workflow
hyeongyun0916 Jan 20, 2026
d7653bf
edit
hyeongyun0916 Jan 21, 2026
05ba63c
Merge branch 'main' into rename-llmd-kv-cache
hyeongyun0916 Jan 21, 2026
2e59a41
refactor TokenProcessorConfig in config
sagearc Jan 21, 2026
cd74019
fix kv cache repo name in docker file
sagearc Jan 21, 2026
7861c04
fix e2e tests
vMaroon Jan 21, 2026
c5f12b0
Merge commit '4884fc207f3bcafc8adb8fe705bbe114c6f9e9d4' into rename-l…
hyeongyun0916 Jan 21, 2026
886cb07
add ignore
hyeongyun0916 Jan 21, 2026
1848a6a
update architecture docs
sagearc Jan 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Git
.git
.gitignore

# Build artifacts
bin
build

# IDE and OS files
.idea
.vscode
*.DS_Store

# Local virtual environments
venv

# Python cache files
__pycache__

# Docker files
Dockerfile
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline missing

33 changes: 18 additions & 15 deletions .github/workflows/ci-pr-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,27 @@ jobs:
check-changes:
runs-on: ubuntu-latest
outputs:
docs: ${{ steps.filter.outputs.docs }}
src: ${{ steps.filter.outputs.src }}
steps:
- name: Checkout source
uses: actions/checkout@v6
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
docs:
- 'README.md'
- 'docs/**'
src:
- '**/*.go'
- '**/*.py'
lint-and-test:
needs: check-changes
if: ${{ needs.check-changes.outputs.docs == 'false' }}
if: ${{ needs.check-changes.outputs.src == 'true' }}
runs-on: ubuntu-latest
steps:
- name: Free Disk Space (Ubuntu)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: can you briefly explain why this is needed and was not needed before?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because now the disk use for dependencies is larger - and it now runs out of space. This was discussed with Greg at some point, we can increase capacity by upgrading tier, but this approach was preferred.

It is temporary in all cases.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the first step, this clears up disk space of packages that are installed (and hopefully not used) before anything from inference-scheduler is actually run?
It is run in the hope that afterwards there will be sufficient disk space on the worker to run the build.

uses: jlumbroso/free-disk-space@main
with:
tool-cache: false

- name: Checkout source
uses: actions/checkout@v6

Expand All @@ -43,9 +48,6 @@ jobs:
go-version: "${{ env.GO_VERSION }}"
cache-dependency-path: ./go.sum

- name: Install dependencies
run: sudo make install-dependencies

- name: Configure CGO for Python
run: |
PYTHON_INCLUDE=$(python3 -c "import sysconfig; print(sysconfig.get_path('include'))")
Expand All @@ -57,13 +59,16 @@ jobs:
- name: Set PKG_CONFIG_PATH
run: echo "PKG_CONFIG_PATH=/usr/lib/pkgconfig" >> $GITHUB_ENV

- name: go mod tidy
run: go mod tidy
- name: Install dependencies
run: |
go mod tidy
sudo -E env "PATH=$PATH" make install-dependencies
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: any limitation in running sudo -E env "PATH=$PATH" make install-dependencies install-python-deps in one statement?

sudo -E env "PATH=$PATH" make install-python-deps

- name: Run lint checks
uses: golangci/golangci-lint-action@v9
with:
version: 'v2.1.6'
version: "v2.1.6"
args: "--config=./.golangci.yml"
env:
CGO_ENABLED: ${{ env.CGO_ENABLED }}
Expand All @@ -74,10 +79,8 @@ jobs:

- name: Run make build
shell: bash
run: |
make build
run: make build

- name: Run make test
shell: bash
run: |
make test
run: make test
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
main
bin/

*debug_bin*

# Test binary, built with `go test -c`
*.test

Expand Down
108 changes: 47 additions & 61 deletions Dockerfile.epp
Original file line number Diff line number Diff line change
@@ -1,13 +1,42 @@
## Minimal runtime Dockerfile (microdnf-only, no torch, wrapper in site-packages)
# Build Stage: using Go 1.24 image
FROM quay.io/projectquay/golang:1.24 AS builder
# Go dependencies stage: download go modules and extract kv-cache
FROM quay.io/projectquay/golang:1.24 AS go-deps

WORKDIR /workspace

# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum

# Copy the go source
COPY cmd/ cmd/
COPY pkg/ pkg/

RUN go mod download

# Copy Python wrapper and requirements from llm-d-kv-cache dependency
# Extract version dynamically and copy to a known location
RUN KV_CACHE_PKG=$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache) && \
mkdir -p /workspace/kv-cache && \
cp -r $KV_CACHE_PKG/* /workspace/kv-cache && \
chmod +x /workspace/kv-cache/pkg/preprocessing/chat_completions/setup.sh

FROM python:3.12-slim AS python-builder

RUN apt-get update && apt-get install -y --no-install-recommends build-essential

COPY --from=go-deps /workspace/kv-cache /workspace/kv-cache
WORKDIR /workspace/kv-cache
# llm-d-kv-cache's Makefile. not llm-d-inference-scheduler's
RUN KV_CACHE_PKG=/workspace/kv-cache make install-python-deps

# Go build stage
FROM quay.io/projectquay/golang:1.24 AS go-builder

ARG TARGETOS
ARG TARGETARCH
ARG PYTHON_VERSION=3.12

ENV PYTHON=python${PYTHON_VERSION}
ENV PYTHONPATH=/usr/lib64/${PYTHON}/site-packages:/usr/lib/${PYTHON}/site-packages

# Install build tools
# The builder is based on UBI8, so we need epel-release-8.
Expand All @@ -16,60 +45,30 @@ RUN dnf install -y 'https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.
dnf install -y gcc-c++ libstdc++ libstdc++-devel clang zeromq-devel pkgconfig ${PYTHON}-devel ${PYTHON}-pip git && \
dnf clean all

COPY --from=go-deps /workspace /workspace
COPY --from=go-deps /go/pkg/mod /go/pkg/mod

WORKDIR /workspace

# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
COPY Makefile* ./

# Copy the go source
COPY cmd/ cmd/
COPY pkg/ pkg/
COPY --from=python-builder /workspace/kv-cache/pkg/preprocessing/chat_completions /workspace/kv-cache/pkg/preprocessing/chat_completions
RUN make setup-venv
COPY --from=python-builder /workspace/kv-cache/build/venv/lib/python3.12/site-packages /workspace/build/venv/lib/python3.12/site-packages

RUN go mod download
ENV PYTHONPATH=/workspace/kv-cache/pkg/preprocessing/chat_completions:/workspace/build/venv/lib/python3.12/site-packages
RUN python3.12 -c "import tokenizer_wrapper" # verify tokenizer_wrapper is correctly installed

# Copy Python wrapper and requirements from llm-d-kv-cache-manager dependency
# Extract version dynamically and copy to a known location
# We need to keep llm-d-kv-cache-manager as go module path is kept the old name
RUN KVCACHE_MANAGER_VERSION=$(go list -m -f '{{.Version}}' github.com/llm-d/llm-d-kv-cache-manager) && \
mkdir -p /workspace/kv-cache && \
cp /go/pkg/mod/github.com/llm-d/llm-d-kv-cache-manager@${KVCACHE_MANAGER_VERSION}/pkg/preprocessing/chat_completions/render_jinja_template_wrapper.py \
/workspace/kv-cache/render_jinja_template_wrapper.py && \
cp /go/pkg/mod/github.com/llm-d/llm-d-kv-cache-manager@${KVCACHE_MANAGER_VERSION}/pkg/preprocessing/chat_completions/requirements.txt \
/workspace/kv-cache/requirements.txt

# HuggingFace tokenizer bindings (static lib)
RUN mkdir -p lib
# Ensure that the RELEASE_VERSION matches the one used in the imported llm-d-kv-cache-manager version
ARG RELEASE_VERSION=v1.22.1
RUN curl -L https://github.com/daulet/tokenizers/releases/download/${RELEASE_VERSION}/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
RUN ranlib lib/*.a

# Build
# the GOARCH has not a default value to allow the binary be built according to the host where the command
# was called. For example, if we call make image-build in a local env which has the Apple Silicon M1 SO
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
ENV CGO_ENABLED=1
ENV GOOS=${TARGETOS:-linux}
ENV GOARCH=${TARGETARCH}


ARG COMMIT_SHA=unknown
ARG BUILD_REF
RUN CGO_CFLAGS="$(${PYTHON}-config --cflags) -I/workspace/lib" && \
CGO_LDFLAGS="$(${PYTHON}-config --ldflags --embed) -L/workspace/lib -ltokenizers -ldl -lm" && \
export CGO_CFLAGS CGO_LDFLAGS && \
go build -a -o bin/epp -ldflags="-extldflags '-L$(pwd)/lib' -X sigs.k8s.io/gateway-api-inference-extension/version.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/version.BuildRef=${BUILD_REF}" cmd/epp/main.go
RUN TOKENIZER_VERSION=${RELEASE_VERSION} make build-epp

# Runtime stage
# Use ubi9 as a minimal base image to package the manager binary
# Refer to https://catalog.redhat.com/software/containers/ubi9/ubi-minimal/615bd9b4075b022acc111bf5 for more details
FROM registry.access.redhat.com/ubi9/ubi-minimal:9.7
ARG PYTHON_VERSION=3.12
WORKDIR /
COPY --from=builder /workspace/bin/epp /app/epp
COPY --from=go-builder /workspace/bin/epp /app/epp

USER root

Expand All @@ -87,24 +86,11 @@ RUN curl -L -o /tmp/epel-release.rpm https://dl.fedoraproject.org/pub/epel/epel-
ln -sf /usr/bin/${PYTHON} /usr/bin/python3 && \
ln -sf /usr/bin/${PYTHON} /usr/bin/python

# Copy Python kv-cache package and site-packages from the python-builder stage
COPY --from=python-builder /workspace/kv-cache /workspace/kv-cache
ENV PYTHONPATH=/workspace/kv-cache/pkg/preprocessing/chat_completions:/workspace/kv-cache/build/venv/lib/python3.12/site-packages
RUN ${PYTHON} -c "import tokenizer_wrapper" # verify tokenizer_wrapper is correctly installed

# Install wrapper as a module in site-packages
RUN mkdir -p /usr/local/lib/${PYTHON}/site-packages/
COPY --from=builder /workspace/kv-cache/render_jinja_template_wrapper.py /usr/local/lib/${PYTHON}/site-packages/

# Python deps (no cache, single target) – filter out torch
ENV PIP_NO_CACHE_DIR=1 PIP_DISABLE_PIP_VERSION_CHECK=1
COPY --from=builder /workspace/kv-cache/requirements.txt /tmp/requirements.txt
RUN sed '/^torch\b/d' /tmp/requirements.txt > /tmp/requirements.notorch.txt && \
${PYTHON} -m pip install --no-cache-dir --upgrade pip setuptools wheel && \
${PYTHON} -m pip install --no-cache-dir --target /usr/local/lib/${PYTHON}/site-packages -r /tmp/requirements.notorch.txt && \
${PYTHON} -m pip install --no-cache-dir --target /usr/local/lib/${PYTHON}/site-packages PyYAML && \
rm /tmp/requirements.txt /tmp/requirements.notorch.txt && \
rm -rf /root/.cache/pip

# Python env
ENV PYTHONPATH="/usr/local/lib/${PYTHON}/site-packages:/usr/lib/${PYTHON}/site-packages"
ENV PATH=/usr/bin:/usr/local/bin:$PATH
ENV HF_HOME="/tmp/.cache"

USER 65532:65532
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ test-unit: test-unit-epp test-unit-sidecar ## Run unit tests
.PHONY: test-unit-%
test-unit-%: download-tokenizer install-python-deps check-dependencies ## Run unit tests
@printf "\033[33;1m==== Running Unit Tests ====\033[0m\n"
@KV_CACHE_PKG=$$(go list -m -f '{{.Dir}}/pkg/preprocessing/chat_completions' github.com/llm-d/llm-d-kv-cache-manager 2>/dev/null || echo ""); \
@KV_CACHE_PKG=$$(go list -m -f '{{.Dir}}/pkg/preprocessing/chat_completions' github.com/llm-d/llm-d-kv-cache 2>/dev/null || echo ""); \
PYTHONPATH="$$KV_CACHE_PKG:$(VENV_DIR)/lib/python$(PYTHON_VERSION)/site-packages" \
CGO_CFLAGS=${$*_CGO_CFLAGS} CGO_LDFLAGS=${$*_CGO_LDFLAGS} go test $($*_LDFLAGS) -v $$($($*_TEST_FILES) | tr '\n' ' ')

Expand All @@ -169,7 +169,7 @@ test-filter: download-tokenizer install-python-deps check-dependencies ## Run fi
fi
@TEST_TYPE="$(if $(TYPE),$(TYPE),epp)"; \
printf "\033[33;1m==== Running Filtered Tests (pattern: $(PATTERN), type: $$TEST_TYPE) ====\033[0m\n"; \
KV_CACHE_PKG=$$(go list -m -f '{{.Dir}}/pkg/preprocessing/chat_completions' github.com/llm-d/llm-d-kv-cache-manager 2>/dev/null || echo ""); \
KV_CACHE_PKG=$$(go list -m -f '{{.Dir}}/pkg/preprocessing/chat_completions' github.com/llm-d/llm-d-kv-cache 2>/dev/null || echo ""); \
if [ "$$TEST_TYPE" = "epp" ]; then \
PYTHONPATH="$$KV_CACHE_PKG:$(VENV_DIR)/lib/python$(PYTHON_VERSION)/site-packages" \
CGO_CFLAGS=$(epp_CGO_CFLAGS) CGO_LDFLAGS=$(epp_CGO_LDFLAGS) \
Expand Down
58 changes: 44 additions & 14 deletions Makefile.tools.mk
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ TYPOS_VERSION ?= v1.34.0
## Python Configuration
PYTHON_VERSION ?= 3.12
# Extract RELEASE_VERSION from Dockerfile
TOKENIZER_VERSION := $(shell grep '^ARG RELEASE_VERSION=' Dockerfile.epp | cut -d'=' -f2)
TOKENIZER_VERSION ?= $(shell grep '^ARG RELEASE_VERSION=' Dockerfile.epp | cut -d'=' -f2)

# Python executable for creating venv
PYTHON_EXE := $(shell command -v python$(PYTHON_VERSION) || command -v python3)
Expand Down Expand Up @@ -151,33 +151,63 @@ $(TOKENIZER_LIB): | $(LOCALLIB)
@ranlib $(LOCALLIB)/*.a
@echo "Tokenizer bindings downloaded successfully."


.PHONY: install-python-deps
install-python-deps: ## Sets up Python virtual environment and installs dependencies
@printf "\033[33;1m==== Setting up Python virtual environment in $(VENV_DIR) ====\033[0m\n"
.PHONY: detect-python
detect-python: ## Detects Python and prints the configuration.
@printf "\033[33;1m==== Python Configuration ====\033[0m\n"
@if [ -z "$(PYTHON_EXE)" ]; then \
echo "ERROR: Python 3 not found in PATH."; \
exit 1; \
fi
@# Verify the version of the found python executable using its exit code
@if ! $(PYTHON_EXE) -c "import sys; sys.exit(0 if sys.version_info[:2] == ($(shell echo $(PYTHON_VERSION) | cut -d. -f1), $(shell echo $(PYTHON_VERSION) | cut -d. -f2)) else 1)"; then \
echo "ERROR: Found Python at '$(PYTHON_EXE)' but it is not version $(PYTHON_VERSION)."; \
echo "Please ensure 'python$(PYTHON_VERSION)' or a compatible 'python3' is in your PATH."; \
exit 1; \
fi
@echo "Python executable: $(PYTHON_EXE) ($$($(PYTHON_EXE) --version))"
@echo "Python CFLAGS: $(PYTHON_CFLAGS)"
@echo "Python LDFLAGS: $(PYTHON_LDFLAGS)"
@if [ -z "$(PYTHON_CFLAGS)" ]; then \
echo "ERROR: Python development headers not found. See installation instructions above."; \
exit 1; \
fi
@printf "\033[33;1m==============================\033[0m\n"

.PHONY: setup-venv
setup-venv: detect-python ## Sets up the Python virtual environment.
@printf "\033[33;1m==== Setting up Python virtual environment in $(VENV_DIR) ====\033[0m\n"
@if [ ! -f "$(VENV_BIN)/pip" ]; then \
echo "Creating virtual environment..."; \
$(PYTHON_EXE) -m venv $(VENV_DIR) || { \
echo "ERROR: Failed to create virtual environment."; \
echo "Your Python installation may be missing the 'venv' module."; \
echo "Try: 'sudo apt install python$(PYTHON_VERSION)-venv' or 'sudo dnf install python$(PYTHON_VERSION)-devel'"; \
exit 1; \
}; \
fi
@echo "Upgrading pip and installing dependencies..."
@$(VENV_BIN)/pip install --upgrade pip --quiet
@KV_CACHE_PKG=$$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache-manager 2>/dev/null); \
if [ -n "$$KV_CACHE_PKG" ] && [ -f "$$KV_CACHE_PKG/pkg/preprocessing/chat_completions/requirements.txt" ]; then \
echo "Installing Python dependencies from kv-cache-manager..."; \
$(VENV_BIN)/pip install --quiet -r "$$KV_CACHE_PKG/pkg/preprocessing/chat_completions/requirements.txt"; \
@echo "Upgrading pip..."
@$(VENV_BIN)/pip install --upgrade pip
@echo "Python virtual environment setup complete."

.PHONY: install-python-deps
install-python-deps: setup-venv ## installs dependencies.
@printf "\033[33;1m==== Setting up Python virtual environment in $(VENV_DIR) ====\033[0m\n"
@echo "install vllm..."
@KV_CACHE_PKG=$${KV_CACHE_PKG:-$$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache 2>/dev/null)}; \
if [ -n "$$KV_CACHE_PKG" ] && [ -f "$$KV_CACHE_PKG/pkg/preprocessing/chat_completions/setup.sh" ]; then \
echo "Running kv-cache setup script..."; \
cp "$$KV_CACHE_PKG/pkg/preprocessing/chat_completions/setup.sh" build/kv-cache-setup.sh; \
chmod +x build/kv-cache-setup.sh; \
cd build && PATH=$(VENV_BIN):$$PATH ./kv-cache-setup.sh && cd ..; \
else \
echo "WARNING: Could not find kv-cache-manager requirements.txt, installing minimal deps..."; \
$(VENV_BIN)/pip install --quiet 'transformers>=4.53.0' 'jinja2>=2.11'; \
echo "ERROR: kv-cache package not found or setup script missing."; \
exit 1; \
fi
@echo "✅ Python dependencies installed in venv"
@echo "Verifying vllm installation..."
@$(VENV_BIN)/python -c "import vllm; print('✅ vllm version ' + vllm.__version__ + ' installed.')" || { \
echo "ERROR: vllm library not properly installed in venv."; \
exit 1; \
}

.PHONY: check-tools
check-tools: check-go check-ginkgo check-golangci-lint check-kustomize check-envsubst check-container-tool check-kubectl check-buildah check-typos ## Check that all required tools are installed
Expand Down
6 changes: 3 additions & 3 deletions deploy/config/epp-precise-prefix-cache-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ plugins:
- type: decode-filter
- type: precise-prefix-cache-scorer
parameters:
tokenProcessorConfig:
blockSize: 64 # must match vLLM block size
hashSeed: "42" # must match vLLM PYTHONHASHSEED env var
indexerConfig:
tokenProcessorConfig:
blockSize: 64 # must match vLLM block size
hashSeed: "42" # must match vLLM PYTHONHASHSEED env var
kvBlockIndexConfig:
enableMetrics: true # enable kv-block index metrics (prometheus)
- type: kv-cache-utilization-scorer
Expand Down
7 changes: 4 additions & 3 deletions deploy/config/sim-epp-kvcache-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@ plugins:
- type: prefix-cache-scorer
parameters:
mode: cache_tracking
tokenProcessorConfig:
blockSize: 16 # must match vLLM block size if not default (16)
hashSeed: "42" # must match PYTHONHASHSEED in vLLM pods
kvEventsConfig:
zmqEndpoint: tcp://0.0.0.0:5557
indexerConfig:
prefixStoreConfig:
blockSize: 16
tokenProcessorConfig:
blockSize: 16 # must match vLLM block size if not default (16)
hashSeed: "42" # must match PYTHONHASHSEED in vLLM pods
tokenizersPoolConfig:
modelName: <model-name> # specify the model name to use for tokenizer loading
hf:
tokenizersCacheDir: "/cache/tokenizers"
kvBlockIndexConfig:
Expand Down
Loading