Skip to content

Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K…#24

Merged
nlevin-ui merged 6 commits intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:yeswanthk/disagg-kimi2.5-gb200-Nvfp4
Apr 10, 2026
Merged

Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K…#24
nlevin-ui merged 6 commits intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:yeswanthk/disagg-kimi2.5-gb200-Nvfp4

Conversation

@yeswanthk-26
Copy link
Copy Markdown

Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4 precision on GB200 GPUs. Includes both STP and MTP configurations for ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5 to 2253, with Eagle speculative decoding for MTP variants.

… and ISL1K_OSL1K)

Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4
precision on GB200 GPUs. Includes both STP and MTP configurations for
ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5
to 2253, with Eagle speculative decoding for MTP variants.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 18.18182% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (sa-submission-q2-2026@8294e64). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/srtctl/core/config.py 18.18% 9 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #24   +/-   ##
========================================================
  Coverage                         ?   58.54%           
========================================================
  Files                            ?       47           
  Lines                            ?     4048           
  Branches                         ?        0           
========================================================
  Hits                             ?     2370           
  Misses                           ?     1678           
  Partials                         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yeswanthk-26 yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch from e788dcf to 61f73c2 Compare April 9, 2026 20:26
…and env cleanup

- Update container to tensorrtllm-runtime-1.1.0-dev.2.sqsh
- Point model path to shared /mnt/lustre01/models/kimi-k2.5-nvfp4
- Update Eagle model mount path for MTP configs
- Remove HF_HOME (defaults to ~/.cache/huggingface)
- Fix concurrency separator from space to 'x' for sa-bench compatibility
- Enable multiple frontends for ctx1dep4_gen1dep32_batch64
@yeswanthk-26 yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch 2 times, most recently from b355149 to 493b0ee Compare April 9, 2026 20:31
# concurrency: 666

model:
path: "/mnt/lustre01/models/kimi-k2.5-nvfp4"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paths and containers should not be cluster specific. The srtslurm.yaml should map the generic names to the specific cluster names.

path: "kimi-k2.5-nvfp4"
container: "vllm/vllm-openai:v0.18.0-cu130"
as seen here: https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/vllm/kimi-k2.5/1k1k/disagg-gb200-1p1d-dep4-dep16.yaml#L4C1-L5C46

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addresssed

Replace cluster-specific paths with generic alias names that are resolved
via srtslurm.yaml model_paths and containers mappings, as per upstream convention.
Add model_paths alias resolution for extra_mount host paths in config.py,
enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of
cluster-specific path for the Eagle speculative decoding model.
Per review feedback, update model paths to HuggingFace format
(nvidia/Kimi-K2.5-NVFP4) and container to full NVCR registry path
(nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2) so recipes
are portable and work without pre-built sqsh files.
@yeswanthk-26 yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch from 502d746 to 670289c Compare April 10, 2026 05:34
@nlevin-ui nlevin-ui merged commit 94903bd into NVIDIA:sa-submission-q2-2026 Apr 10, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants