Skip to content

Add kimi-k2.5 nvfp4 GB200 vllm-disagg configs for 8k1k#234

Open
kyleliang-nv wants to merge 2 commits intomainfrom
kylliang/kimik2p5_nvfp4_8k1k_gb200
Open

Add kimi-k2.5 nvfp4 GB200 vllm-disagg configs for 8k1k#234
kyleliang-nv wants to merge 2 commits intomainfrom
kylliang/kimik2p5_nvfp4_8k1k_gb200

Conversation

@kyleliang-nv
Copy link
Copy Markdown
Collaborator

@kyleliang-nv kyleliang-nv commented Apr 1, 2026

Summary by CodeRabbit

  • New Features
    • Added multiple vLLM disaggregated serving recipes for GB200 GPU deployments with varied prefill/decode node topologies.
    • Recipes enable FP4 model precision, Dynamo frontend routing, and vLLM backend optimizations (FlashInfer MoE, expert parallel).
    • Per-phase KV caching (fp8), async decode scheduling, CUDA graph capture tuning, and sa-bench benchmark presets for high-concurrency testing.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Adds four new disaggregated vLLM deployment recipes for Kimi-K2.5 targeting GB200 GPUs, each defining distinct prefill/decode node/worker splits, vLLM backend/frontend settings, KV-transfer and FP8/FP4 options, CUDA graph capture and async decode parameters, and sa-bench benchmark blocks.

Changes

Cohort / File(s) Summary
Disaggregated vLLM recipes
recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml, recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml, recipes/vllm/kimi-k2.5/disagg-gb200-5p1d.yaml, recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml
Adds four new vLLM deployment recipes. Each file defines container/model/precision, Dynamo frontend setup, disaggregated prefill vs decode node/worker topology, per-phase env vars, detailed vllm_config (NixlConnector KV transfer, fp8 KV cache dtype, FlashInfer MLA, parallelism, CUDA graph capture/async decode), and sa-bench benchmark configs.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Frontend as Dynamo Frontend
    participant Prefill as Prefill Nodes (vLLM)
    participant Decode as Decode Nodes (vLLM)
    participant Store as Model Artifact Store

    rect rgba(200,230,255,0.5)
    Client->>Frontend: send request
    end

    rect rgba(200,255,200,0.5)
    Frontend->>Prefill: route prefill work (KV transfer)
    Prefill->>Store: load/model shard & KV cache
    Prefill-->>Frontend: prefill responses / KV state
    end

    rect rgba(255,230,200,0.5)
    Frontend->>Decode: route decode tasks (use KV cache)
    Decode->>Store: fetch model shards if needed
    Decode-->>Frontend: decoded tokens/response
    end

    Frontend->>Client: return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • ishandhanani
  • trevor-m

Poem

🐰 I hopped through YAML fields today,
Split prefill, decode, in bright array,
GB200 humming under moonlight,
Kimi-K2.5 ready to write,
Hooray for batches, capture, and play! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding new vLLM disaggregated deployment recipes for the Kimi-K2.5 model with NVFP4 precision on GB200 GPUs, exactly matching the four new YAML recipe files added.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kylliang/kimik2p5_nvfp4_8k1k_gb200

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml`:
- Line 1: The recipe name string
"coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep" is missing the trailing
"4" and should match the filename; update the name field in the YAML to
"coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep4" so sweep/result IDs
align (edit the name value in the top-level YAML entry).

In `@recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml`:
- Line 12: The YAMLs (e.g., recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml)
reference setup_script: install-deps.sh but that script is missing; either add a
new executable script named install-deps.sh at the repository root (or the
expected scripts/ location) containing the dependency installation steps used by
your launcher, or update the setup_script value in all four affected YAMLs to
point to an existing script name/path in the repo (ensure the referenced script
is executable and contains the required install commands).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b486cd50-24fc-4aee-91a3-7071c0ccb647

📥 Commits

Reviewing files that changed from the base of the PR and between f0a303b and 81280f7.

📒 Files selected for processing (4)
  • recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-5p1d.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml

@@ -0,0 +1,98 @@
name: "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Recipe name looks truncated (-tep vs -tep4).

This creates avoidable mismatch with the filename and can confuse sweep/result identification.

Suggested fix
-name: "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep"
+name: "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep4"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep"
name: "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep4"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml` at line 1, The recipe
name string "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep" is missing
the trailing "4" and should match the filename; update the name field in the
YAML to "coreai_devtech_all-sa.kimi-vllm-disagg-gb200-1p4d-tep4" so sweep/result
IDs align (edit the name value in the top-level YAML entry).

Comment thread recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml (1)

32-45: Consider reducing repeated YAML with anchors/aliases.
prefill_environment/decode_environment and several vLLM keys are duplicated; anchors would make edits safer.

♻️ Optional YAML dedup pattern
+  _common_environment: &common_environment
+    VLLM_USE_FLASHINFER_MOE_FP4: "1"
+    VLLM_USE_NCCL_SYMM_MEM: "1"
+    NCCL_CUMEM_ENABLE: "1"
+    NCCL_MNNVL_ENABLE: "1"
+    NCCL_NVLS_ENABLE: "1"
+
-  prefill_environment:
-    VLLM_USE_FLASHINFER_MOE_FP4: "1"
-    VLLM_USE_NCCL_SYMM_MEM: "1"
-    NCCL_CUMEM_ENABLE: "1"
-    NCCL_MNNVL_ENABLE: "1"
-    NCCL_NVLS_ENABLE: "1"
+  prefill_environment: *common_environment

-  decode_environment:
-    VLLM_USE_FLASHINFER_MOE_FP4: "1"
-    VLLM_USE_NCCL_SYMM_MEM: "1"
-    NCCL_CUMEM_ENABLE: "1"
-    NCCL_MNNVL_ENABLE: "1"
-    NCCL_NVLS_ENABLE: "1"
+  decode_environment: *common_environment

Also applies to: 48-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml` around lines 32 - 45, The YAML
repeats the same environment entries under prefill_environment and
decode_environment (and again in the later block); refactor by extracting the
shared map into a YAML anchor (e.g., &vllm_env) containing the
VLLM_USE_FLASHINFER_MOE_FP4, VLLM_USE_NCCL_SYMM_MEM, NCCL_CUMEM_ENABLE,
NCCL_MNNVL_ENABLE, NCCL_NVLS_ENABLE keys and then reference it with aliases
(*vllm_env) under prefill_environment and decode_environment (and the
corresponding later section) so edits to the vllm env only need to be done once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml`:
- Around line 32-45: The YAML repeats the same environment entries under
prefill_environment and decode_environment (and again in the later block);
refactor by extracting the shared map into a YAML anchor (e.g., &vllm_env)
containing the VLLM_USE_FLASHINFER_MOE_FP4, VLLM_USE_NCCL_SYMM_MEM,
NCCL_CUMEM_ENABLE, NCCL_MNNVL_ENABLE, NCCL_NVLS_ENABLE keys and then reference
it with aliases (*vllm_env) under prefill_environment and decode_environment
(and the corresponding later section) so edits to the vllm env only need to be
done once.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a40cd91f-9f0c-42e9-a416-7ebe7328f147

📥 Commits

Reviewing files that changed from the base of the PR and between 81280f7 and c1a5a87.

📒 Files selected for processing (4)
  • recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-3p1d.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-5p1d.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml
✅ Files skipped from review due to trivial changes (1)
  • recipes/vllm/kimi-k2.5/disagg-gb200-1p4d-tep4.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • recipes/vllm/kimi-k2.5/disagg-gb200-6p1d-dep16.yaml
  • recipes/vllm/kimi-k2.5/disagg-gb200-5p1d.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant