Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K… by yeswanthk-26 · Pull Request #24 · NVIDIA/srt-slurm

yeswanthk-26 · 2026-04-09T20:16:35Z

Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4 precision on GB200 GPUs. Includes both STP and MTP configurations for ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5 to 2253, with Eagle speculative decoding for MTP variants.

… and ISL1K_OSL1K) Add optimized disaggregated inference recipes for Kimi K2.5 model with NVfp4 precision on GB200 GPUs. Includes both STP and MTP configurations for ISL8K_OSL1K and ISL1K_OSL1K workloads covering concurrency points from 5 to 2253, with Eagle speculative decoding for MTP variants.

codecov-commenter · 2026-04-09T20:17:27Z

Codecov Report

❌ Patch coverage is 18.18182% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (sa-submission-q2-2026@8294e64). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/srtctl/core/config.py	18.18%	9 Missing ⚠️

Additional details and impacted files

@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #24   +/-   ##
========================================================
  Coverage                         ?   58.54%           
========================================================
  Files                            ?       47           
  Lines                            ?     4048           
  Branches                         ?        0           
========================================================
  Hits                             ?     2370           
  Misses                           ?     1678           
  Partials                         ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…and env cleanup - Update container to tensorrtllm-runtime-1.1.0-dev.2.sqsh - Point model path to shared /mnt/lustre01/models/kimi-k2.5-nvfp4 - Update Eagle model mount path for MTP configs - Remove HF_HOME (defaults to ~/.cache/huggingface) - Fix concurrency separator from space to 'x' for sa-bench compatibility - Enable multiple frontends for ctx1dep4_gen1dep32_batch64

nlevin-ui · 2026-04-10T05:01:03Z

+# concurrency: 666
+
+model:
+  path: "/mnt/lustre01/models/kimi-k2.5-nvfp4"


The paths and containers should not be cluster specific. The srtslurm.yaml should map the generic names to the specific cluster names.

path: "kimi-k2.5-nvfp4"
container: "vllm/vllm-openai:v0.18.0-cu130"
as seen here: https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/vllm/kimi-k2.5/1k1k/disagg-gb200-1p1d-dep4-dep16.yaml#L4C1-L5C46

Replace cluster-specific paths with generic alias names that are resolved via srtslurm.yaml model_paths and containers mappings, as per upstream convention.

Add model_paths alias resolution for extra_mount host paths in config.py, enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of cluster-specific path for the Eagle speculative decoding model.

Per review feedback, update model paths to HuggingFace format (nvidia/Kimi-K2.5-NVFP4) and container to full NVCR registry path (nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2) so recipes are portable and work without pre-built sqsh files.

…200-Nvfp4

yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch from e788dcf to 61f73c2 Compare April 9, 2026 20:26

yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch 2 times, most recently from b355149 to 493b0ee Compare April 9, 2026 20:31

nlevin-ui reviewed Apr 10, 2026

View reviewed changes

yeswanthk-26 added 2 commits April 10, 2026 05:12

Use generic model path and container aliases for cluster portability

51aaee2

Replace cluster-specific paths with generic alias names that are resolved via srtslurm.yaml model_paths and containers mappings, as per upstream convention.

Add extra_mount alias resolution and use generic Eagle model path

0c65453

Add model_paths alias resolution for extra_mount host paths in config.py, enabling MTP recipes to use generic name "kimi-k2.5-eagle3" instead of cluster-specific path for the Eagle speculative decoding model.

nlevin-ui approved these changes Apr 10, 2026

View reviewed changes

yeswanthk-26 force-pushed the yeswanthk/disagg-kimi2.5-gb200-Nvfp4 branch from 502d746 to 670289c Compare April 10, 2026 05:34

Merge branch 'sa-submission-q2-2026' into yeswanthk/disagg-kimi2.5-gb…

d16dcec

…200-Nvfp4

nlevin-ui merged commit 94903bd into NVIDIA:sa-submission-q2-2026 Apr 10, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K…#24

Add Kimi K2.5 disagg STP and MTP recipes for GB200 NVfp4 (ISL8K_OSL1K…#24
nlevin-ui merged 6 commits intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:yeswanthk/disagg-kimi2.5-gb200-Nvfp4

yeswanthk-26 commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026 •

edited

Loading

Uh oh!

nlevin-ui Apr 10, 2026

Uh oh!

yeswanthk-26 Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yeswanthk-26 commented Apr 9, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nlevin-ui Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

yeswanthk-26 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Apr 9, 2026 •

edited

Loading