-
Notifications
You must be signed in to change notification settings - Fork 691
fix: Fix KVBM Guide #2539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix KVBM Guide #2539
Conversation
Signed-off-by: jthomson04 <[email protected]>
WalkthroughDocumentation updated to shift KV-BM usage from a dedicated kvbm framework to vLLM with KV transfer configuration. Build and run commands now target the vLLM framework with an enable flag. Environment variable-based KV manager enablement was removed. The serve command now includes an explicit kv-transfer-config using DynamoConnector. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant C as Container
participant V as vLLM Server
participant D as DynamoConnector
participant K as KV Backend
U->>C: ./container/build.sh --framework vllm --enable-kvbm
U->>C: ./container/run.sh --framework vllm -it --mount-workspace --use-nixl-gds
U->>V: vllm serve --kv-transfer-config {kv_connector:DynamoConnector, kv_role:kv_both, ...} model
Note right of V: Env var DYN_KVBM_MANAGER no longer used
U->>V: HTTP request (inference)
V->>D: Initialize KV transfer via connector
D->>K: Read/Write KV as per kv_role=kv_both
V-->>U: Inference response
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/guides/run_kvbm_in_vllm.md (3)
20-20: Fix typos and improve clarity in the intro sentence.Minor wording/grammar issues.
Apply this diff:
-This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in vLLM. +This guide explains how to leverage KVBM (KV Block Manager) to manage the KV cache and perform KV offloading in vLLM.
50-60: Fix typos in the example prompt (user-facing).There are a couple of misspellings in the example JSON payload.
Apply this diff (only the content string is changed; JSON remains valid):
- "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden." + "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map suggesting that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost family, or another hidden clue?"
28-31: Update etcd compose path and service name in documentationThe compose file has moved and now defines the service as
etcd-server. Please update the snippet indocs/guides/run_kvbm_in_vllm.mdaccordingly:• File: docs/guides/run_kvbm_in_vllm.md (lines 28–31)
Replace:# start up etcd for KVBM leader/worker registration and discovery docker compose -f deploy/metrics/docker-compose.yml up -dWith:
# start up etcd for KVBM leader/worker registration and discovery docker compose -f deploy/docker-compose.yml up -d etcd-server
🧹 Nitpick comments (3)
docs/guides/run_kvbm_in_vllm.md (3)
32-37: Note on removed KVBM manager env: add a brief migration note.Since
DYN_KVBM_MANAGER=kvbmis no longer needed, consider adding a one-liner explaining that KVBM is now activated via vLLM’s--kv-transfer-configinstead of an env var.Example addition after Line 37:
+Note: Prior versions used `export DYN_KVBM_MANAGER=kvbm`. This is no longer required—KVBM is enabled via the `--kv-transfer-config` flag in `vllm serve`.
38-45: Clarify units and default behavior for KVBM environment variablesVerified that
DYN_KVBM_CPU_CACHE_GBandDYN_KVBM_DISK_CACHE_GBare indeed read by the vLLM integration (compute_num_blocksinleader.rs). To improve clarity, add a note in the guide:Location:
docs/guides/run_kvbm_in_vllm.md, after line 45docs/guides/run_kvbm_in_vllm.md @@ Line 45 +Note: Values are in gigabytes (GB). If unset, KVBM falls back to its built-in defaults. Adjust CPU/disk budgets based on your workload and node capacity.
47-47: Confirmed connector module path; optional explicit portVerified that
dynamo.llm.vllm_integration.connectorexists and exportsDynamoConnector. The JSON quoting is correct for bash. For clarity, you may optionally specify the port in thevllm servecommand:• Location:
docs/guides/run_kvbm_in_vllm.md:47Suggested diff:
-vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both","kv_connector_module_path":"dynamo.llm.vllm_integration.connector"}' deepseek-ai/DeepSeek-R1-Distill-Llama-8B +vllm serve --port 8000 --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both","kv_connector_module_path":"dynamo.llm.vllm_integration.connector"}' deepseek-ai/DeepSeek-R1-Distill-Llama-8BLeave as-is if you prefer using the default port.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/guides/run_kvbm_in_vllm.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
docs/guides/run_kvbm_in_vllm.md (1)
33-33: Confirmed:build.shsupports--framework vllmand--enable-kvbmBoth flags are explicitly handled in
container/build.sh, so the documentation snippet is accurate:
container/build.shdefines FRAMEWORKS=(["VLLM"]=…) and uppercases input (lines 52, 305–311), and the help text lists “vllm” as a valid framework (line 370).- The
--enable-kvbmflag is parsed in the case block (lines 278–280) and shown in usage output (line 386).Docs can remain as-is.
Signed-off-by: jthomson04 <[email protected]> Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: jthomson04 <[email protected]>
Summary by CodeRabbit