Consolidate B200 recipes: merge 40 per-variant STP/MTP files into 4 combined files by weireweire · Pull Request #206 · ishandhanani/srt-slurm

weireweire · 2026-03-04T03:31:36Z

Summary

Merged 40 individual per-variant YAML files (under b200-fp8/1k1k/stp|mtp/, b200-fp8/8k1k/stp|mtp/, b200-fp4/1k1k/stp|mtp/, b200-fp4/8k1k/stp|mtp/) into 4 combined recipe files, one per precision×isl
Each combined file uses a shared base plus zip_override_* / override_* blocks to express STP and MTP variants with minimal duplication
Added inline section comments (# Model configuration, # Disaggregation mode, # Memory and token limits, # Parallelism, # Attention, # MoE, # Other flags) matching the originals
All 40 variants verified equivalent to originals via diff script before deletion
Deleted the old per-variant subdirectories

Test plan

make check passes (336 tests, lint clean)
Diff script confirms all 40 variants are semantically equivalent to original files
srtctl dry-run -f recipes/b200-fp8/1k1k.yaml previews expected configs
srtctl dry-run -f recipes/b200-fp4/8k1k.yaml previews expected configs

Summary by CodeRabbit

Release Notes

New Features
- Added B200-FP4 and B200-FP8 inference configurations for 1k1k and 8k1k model deployments
- Introduced multiple deployment variants: low-latency and max-throughput profiles for both standard and speculative token prediction modes
- Enabled speculative decoding (EAGLE) support in multi-token prediction configurations
Chores
- Consolidated individual configuration files into unified variant-based deployment templates

coderabbitai · 2026-03-04T03:31:55Z

Warning

Rate limit exceeded

@weireweire has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 14 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a48fb719-efcb-49f3-a040-3874181d5da8

📥 Commits

Reviewing files that changed from the base of the PR and between ac84b66 and 42af38e.

📒 Files selected for processing (45)

recipes/b200-fp4/1k1k.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/8k1k.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-4p-dep8-1d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp4/8k1k/stp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp8/1k1k.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p3d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p3d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp8/8k1k.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p4d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-2p1d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p4d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-2p1d.yaml
tests/test_override.py

📝 Walkthrough

Walkthrough

This PR consolidates B200-FP4 and B200-FP8 deployment recipes by centralizing multiple variant configurations into unified YAML files with override keys. Four comprehensive recipe files are added (1k1k and 8k1k for each precision), replacing numerous individual variant files covering STP/MTP inference modes.

Changes

Cohort / File(s)	Summary
B200-FP4 Consolidated Recipes `recipes/b200-fp4/1k1k.yaml`, `recipes/b200-fp4/8k1k.yaml`	Adds two comprehensive YAML configurations with base settings and multiple variant overrides (zip_override_stp_lowlat, zip_override_mtp_lowlat, zip_override_stp_maxtpt, zip_override_mtp_maxtpt, etc.) for STP/MTP inference modes, disaggregation patterns, and throughput/latency profiles. Centralizes model, resources, backend, sglang_config, health_check, and benchmark definitions.
B200-FP4 Individual Variants (Deletions) `recipes/b200-fp4/1k1k/{mtp,stp}/.yaml`, `recipes/b200-fp4/8k1k/{mtp,stp}/.yaml`	Removes individual variant configuration files (low-latency and max-tpt profiles with various node/worker/decode scales). Content consolidated into parent 1k1k.yaml and 8k1k.yaml files. ~14 files deleted.
B200-FP8 Consolidated Recipes `recipes/b200-fp8/1k1k.yaml`, `recipes/b200-fp8/8k1k.yaml`	Adds two comprehensive FP8 YAML configurations with base settings and variant overrides (zip_override_stp_lowlat, zip_override_mtp_lowlat, zip_override_stp_maxtpt, zip_override_mtp_maxtpt, etc.) mirroring FP4 structure with FP8-specific tuning and resource allocations.
B200-FP8 Individual Variants (Deletions) `recipes/b200-fp8/1k1k/{mtp,stp}/.yaml`, `recipes/b200-fp8/8k1k/{mtp,stp}/.yaml`	Removes individual FP8 variant configuration files across low-latency and max-tpt profiles. Content consolidated into parent 1k1k.yaml and 8k1k.yaml files. ~14 files deleted.
Test Updates `tests/test_override.py`	Updates comment text in test file to clarify override auto-naming behavior (minor documentation change).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Update MTP recipe with multiple draft steps #78 — Updates MTP speculative decoding settings in deployment recipes, including SGLANG_ENABLE_SPEC_V2 and draft token configurations that align with variant definitions in this PR.
Update gb200 recipes #130 — Modifies deployment recipe YAMLs with SGLANG/backend keys, disaggregation-transfer-backend, and model container updates that share pattern similarities with this consolidation.

Suggested reviewers

ishandhanani
gracehonv

Poem

🐰 Four recipes now dance in unified grace,
Where variants fold in a consolidated space,
From scattered configs to YAML's neat fold,
B200 spins faster, both FP4 and bold—
Override keys orchestrate latency's race! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title accurately summarizes the main objective: consolidating 40 individual per-variant recipe files into 4 combined files using override blocks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch simplify/b200-recipes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/b200-fp4/8k1k.yaml`:
- Line 10: The usage comment claiming "all 12 variants" is incorrect; there are
11 non-base variants in this recipe. Update the inline comment on the top-line
usage (the "srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 12 variants"
string) to the correct count (e.g., "all 11 variants"), or alternatively
add/remove variant entries so the number matches; verify by counting the variant
definitions in this file (the entries that define variant names) and make the
comment consistent with the actual variants.

In `@recipes/b200-fp8/8k1k.yaml`:
- Around line 17-253: CI schema validation is failing because this recipe uses a
base + override pattern (the top-level "base" block and override groups like
"zip_override_stp_lowlat", "zip_override_mtp_lowlat", "zip_override_stp_maxtpt",
"zip_override_mtp_maxtpt") but the validator treats the file as a single
concrete config; update the validator to detect files that define a "base" plus
"override" or "zip_override_*" keys and validate by expanding the base with each
override variant (or validating the base and each merged variant) rather than
validating the raw file shape, ensuring required fields in the expanded configs
are present and unknown-field errors are reported against the merged variants
instead of the override wrapper.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 920738ea-191a-49e8-8d0f-438a31784b84

📥 Commits

Reviewing files that changed from the base of the PR and between 3053690 and ac84b66.

📒 Files selected for processing (45)

recipes/b200-fp4/1k1k.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/8k1k.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-4p-dep8-1d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp4/8k1k/stp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp8/1k1k.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p3d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p3d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp8/8k1k.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p4d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-2p1d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p4d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-2p1d.yaml
tests/test_override.py

💤 Files with no reviewable changes (40)

recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p4d.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p3d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-2p1d.yaml
recipes/b200-fp4/1k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-6d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp8/8k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p6d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-2p5d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-2d.yaml
recipes/b200-fp4/1k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp8/1k1k/stp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp4/1k1k/mtp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p1d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-4p-dep8-1d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p1d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-1d.yaml
recipes/b200-fp8/8k1k/mtp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p2d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-1p-tep8-5d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-2p1d.yaml
recipes/b200-fp4/1k1k/stp/max-tpt-dep4-1p-dep8-1d.yaml
recipes/b200-fp4/8k1k/stp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp8/1k1k/stp/low-latency-tep8-1p3d.yaml
recipes/b200-fp8/8k1k/mtp/low-latency-tep8-1p4d.yaml
recipes/b200-fp4/8k1k/mtp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-tp4-1p-tp8-1d.yaml
recipes/b200-fp8/8k1k/stp/max-tpt-dep8-1p1d.yaml
recipes/b200-fp4/8k1k/stp/low-latency-dep4-2p-tep8-5d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p5d.yaml
recipes/b200-fp4/8k1k/mtp/max-tpt-dep4-7p-dep8-2d.yaml
recipes/b200-fp8/1k1k/mtp/max-tpt-dep8-1p1d.yaml

coderabbitai · 2026-03-04T03:38:01Z

+#   override_mtp_maxtpt_4p1d:  MTP-only 4p1d, no frontends, env-var FP4 backend
+#
+# Usage:
+#   srtctl apply  -f recipes/b200-fp4/8k1k.yaml                              # all 12 variants


⚠️ Potential issue | 🟡 Minor

Usage comment variant count appears off by one.

The file currently defines 11 non-base variants, not 12.

Suggested correction

-# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 12 variants +# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 11 variants

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 12 variants

# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 11 variants

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@recipes/b200-fp4/8k1k.yaml` at line 10, The usage comment claiming "all 12 variants" is incorrect; there are 11 non-base variants in this recipe. Update the inline comment on the top-line usage (the "srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 12 variants" string) to the correct count (e.g., "all 11 variants"), or alternatively add/remove variant entries so the number matches; verify by counting the variant definitions in this file (the entries that define variant names) and make the comment consistent with the actual variants.

coderabbitai · 2026-03-04T03:38:01Z

+base:
+  name: "b200-fp8-stp-8k1k"
+
+  model:
+    path: "dsr1-fp8"
+    container: "dynamo-sglang"
+    precision: "fp8"
+
+  resources:
+    gpu_type: "b200"
+    prefill_nodes: 1
+    prefill_workers: 1
+    decode_nodes: 1
+    decode_workers: 1
+    gpus_per_node: 8
+
+  backend:
+    prefill_environment:
+      TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
+      PYTHONUNBUFFERED: "1"
+      DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
+      SGLANG_ENABLE_JIT_DEEPGEMM: "false"
+      SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+      SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+      SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+      SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+      SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+      SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+      MC_FORCE_MNNVL: "1"
+      NCCL_MNNVL_ENABLE: "1"
+      NCCL_CUMEM_ENABLE: "1"
+      DYN_REQUEST_PLANE: nats
+    decode_environment:
+      TORCH_DISTRIBUTED_DEFAULT_TIMEOUT: "1800"
+      PYTHONUNBUFFERED: "1"
+      DYN_SKIP_SGLANG_LOG_FORMATTING: "1"
+      SGLANG_ENABLE_JIT_DEEPGEMM: "false"
+      SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE: "100000"
+      SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
+      SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
+      SGLANG_DECODE_BOOTSTRAP_TIMEOUT: "1000"
+      SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
+      SGLANG_USE_MESSAGE_QUEUE_BROADCASTER: "0"
+      SGLANG_DISABLE_TP_MEMORY_INBALANCE_CHECK: "1"
+      MC_FORCE_MNNVL: "1"
+      NCCL_MNNVL_ENABLE: "1"
+      NCCL_CUMEM_ENABLE: "1"
+      DYN_REQUEST_PLANE: nats
+    sglang_config:
+      prefill:
+        # Model configuration
+        served-model-name: "deepseek-ai/DeepSeek-R1"
+        trust-remote-code: true
+        quantization: "fp8"
+
+        # Disaggregation mode
+        disaggregation-mode: "prefill"
+        disaggregation-transfer-backend: nixl
+
+        # Memory and token limits
+        mem-fraction-static: 0.85
+        max-prefill-tokens: 32768
+        chunked-prefill-size: 32768
+        context-length: 9600
+        max-running-requests: 512
+        disable-cuda-graph: true
+
+        # Parallelism
+        tensor-parallel-size: 8
+        data-parallel-size: 1
+        expert-parallel-size: 8
+
+        # Attention
+        attention-backend: "trtllm_mla"
+        kv-cache-dtype: "fp8_e4m3"
+
+        # MoE
+        moe-runner-backend: "flashinfer_trtllm"
+        # moe-dense-tp-size: 1
+
+        # Other flags
+        stream-interval: 30
+        watchdog-timeout: 1000000
+        enable-flashinfer-allreduce-fusion: true
+        disable-radix-cache: true
+
+      decode:
+        # Model configuration
+        served-model-name: "deepseek-ai/DeepSeek-R1"
+        trust-remote-code: true
+        quantization: "fp8"
+
+        # Disaggregation mode
+        disaggregation-mode: "decode"
+        disaggregation-transfer-backend: nixl
+
+        # Memory and token limits
+        mem-fraction-static: 0.85
+        max-prefill-tokens: 32768
+        chunked-prefill-size: 32768
+        context-length: 9600
+        max-running-requests: 512
+        cuda-graph-max-bs: 512
+
+        # Parallelism
+        tensor-parallel-size: 8
+        data-parallel-size: 1
+        expert-parallel-size: 8
+
+        # Attention
+        attention-backend: "trtllm_mla"
+        kv-cache-dtype: "fp8_e4m3"
+
+        # MoE
+        moe-runner-backend: "flashinfer_trtllm"
+        # moe-dense-tp-size: 1
+
+        # Other flags
+        stream-interval: 30
+        watchdog-timeout: 1000000
+        enable-flashinfer-allreduce-fusion: true
+        disable-radix-cache: true
+        # disable-chunked-prefix-cache: true
+
+  health_check:
+    max_attempts: 360
+    interval_seconds: 10
+
+  benchmark:
+    type: "sa-bench"
+    isl: 8192
+    osl: 1024
+    req_rate: "inf"
+
+
+# STP low-latency: tep8 decode (DP=1), scale sweep 1p1d/1p4d/1p6d
+zip_override_stp_lowlat:
+  name:
+    - "b200-fp8-stp-low-latency-tep8-1p-1d"
+    - "b200-fp8-stp-low-latency-tep8-1p-4d"
+    - "b200-fp8-stp-low-latency-tep8-1p-6d"
+  resources:
+    decode_nodes: [1, 4, 6]
+    decode_workers: [1, 4, 6]
+  benchmark:
+    concurrencies: ["4x32x64", "64", "32"]
+
+
+# MTP low-latency: same scales as STP, adds EAGLE speculative decoding
+zip_override_mtp_lowlat:
+  name:
+    - "b200-fp8-mtp-low-latency-tep8-1p-1d"
+    - "b200-fp8-mtp-low-latency-tep8-1p-4d"
+    - "b200-fp8-mtp-low-latency-tep8-1p-6d"
+  resources:
+    decode_nodes: [1, 4, 6]
+    decode_workers: [1, 4, 6]
+  backend:
+    prefill_environment:
+      SGLANG_ENABLE_SPEC_V2: "1"
+    decode_environment:
+      SGLANG_ENABLE_SPEC_V2: "1"
+    sglang_config:
+      prefill:
+        moe-dense-tp-size: 1
+      decode:
+        speculative-algorithm: "EAGLE"
+        speculative-num-steps: 2
+        speculative-eagle-topk: 1
+        speculative-num-draft-tokens: 3
+  benchmark:
+    concurrencies: ["16x32x64", "8x256", "4x8x16x256"]
+
+
+# STP max-throughput: dep8 decode (DP=8), scale sweep 1p1d and 2p1d
+zip_override_stp_maxtpt:
+  name:
+    - "b200-fp8-stp-max-tpt-dep8-1p-1d"
+    - "b200-fp8-stp-max-tpt-dep8-2p-1d"
+  resources:
+    prefill_nodes: [1, 2]
+    prefill_workers: [1, 2]
+    decode_nodes: [1, 1]
+    decode_workers: [1, 1]
+  backend:
+    sglang_config:
+      prefill:
+        data-parallel-size: 8
+        enable-dp-attention: true
+        enable-dp-lm-head: true
+        moe-dense-tp-size: 1
+        max-running-requests: 1024
+      decode:
+        data-parallel-size: 8
+        enable-dp-attention: true
+        enable-dp-lm-head: true
+        moe-dense-tp-size: 1
+        max-running-requests: 1024
+        cuda-graph-max-bs: 1024
+  benchmark:
+    concurrencies: ["128", "256"]
+
+
+# MTP max-throughput: dep8 decode, scale sweep 1p1d/1p2d/2p1d, adds EAGLE speculative decoding
+# Note: max-running-requests stays at 512 for MTP (unlike STP which raises to 1024)
+zip_override_mtp_maxtpt:
+  name:
+    - "b200-fp8-mtp-max-tpt-dep8-1p-1d"
+    - "b200-fp8-mtp-max-tpt-dep8-1p-2d"
+    - "b200-fp8-mtp-max-tpt-dep8-2p-1d"
+  resources:
+    prefill_nodes: [1, 1, 2]
+    prefill_workers: [1, 1, 2]
+    decode_nodes: [1, 2, 1]
+    decode_workers: [1, 2, 1]
+  backend:
+    prefill_environment:
+      SGLANG_ENABLE_SPEC_V2: "1"
+    decode_environment:
+      SGLANG_ENABLE_SPEC_V2: "1"
+    sglang_config:
+      prefill:
+        data-parallel-size: 8
+        enable-dp-attention: true
+        enable-dp-lm-head: true
+        moe-dense-tp-size: 1
+      decode:
+        data-parallel-size: 8
+        enable-dp-attention: true
+        enable-dp-lm-head: true
+        moe-dense-tp-size: 1
+        speculative-algorithm: "EAGLE"
+        speculative-num-steps: 2
+        speculative-eagle-topk: 1
+        speculative-num-draft-tokens: 3
+  benchmark:
+    concurrencies: ["256", "128x256x512x1024", "128x512"]


⚠️ Potential issue | 🔴 Critical

Override-format recipe is currently blocked by CI schema validation.

This file uses base + zip_override_*/override_*, but CI is validating it as a single concrete config, which causes the reported missing required fields and unknown fields. Please make recipe validation override-aware (detect override configs and validate expanded variants) before merge.

I can help draft the validator-side patch if you want.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@recipes/b200-fp8/8k1k.yaml` around lines 17 - 253, CI schema validation is failing because this recipe uses a base + override pattern (the top-level "base" block and override groups like "zip_override_stp_lowlat", "zip_override_mtp_lowlat", "zip_override_stp_maxtpt", "zip_override_mtp_maxtpt") but the validator treats the file as a single concrete config; update the validator to detect files that define a "base" plus "override" or "zip_override_*" keys and validate by expanding the base with each override variant (or validating the base and each merged variant) rather than validating the raw file shape, ensuring required fields in the expanded configs are present and unknown-field errors are reported against the merged variants instead of the override wrapper.

Reduce 40 individual recipe files to 8 override files (one per precision × isl × stp/mtp combination). Each file uses zip_override_scale to sweep all node-count variants, eliminating per-variant YAML duplication. FP4 8k1k files additionally use override_tp4 to cover the TP4 prefill mode alongside the default dep4 variants. Before: b200-fp8 (21 files) + b200-fp4 (19 files) = 40 files After: 8 override files covering all same variants recipes/b200-fp8/1k1k-stp.yaml (4 variants: 1p1d/1p3d low-lat, 1p5d/2p5d max-tpt) recipes/b200-fp8/1k1k-mtp.yaml (6 variants) recipes/b200-fp8/8k1k-stp.yaml (5 variants: 1p1d/1p4d/1p6d low-lat, 1p1d/2p1d max-tpt) recipes/b200-fp8/8k1k-mtp.yaml (6 variants) recipes/b200-fp4/1k1k-stp.yaml (4 variants: 1p5d/1p6d low-lat, 1p1d/1p2d max-tpt) recipes/b200-fp4/1k1k-mtp.yaml (4 variants) recipes/b200-fp4/8k1k-stp.yaml (5 dep4 variants + override_tp4) recipes/b200-fp4/8k1k-mtp.yaml (5 dep4 variants + override_tp4)

…ions Recipe fixes: - Move num_additional_frontends from resources: to frontend: in FP4 8k1k files (was causing schema validation Unknown field error) - Fix override_maxtpt_4p1d: use frontend: null to drop frontend config (original file has no frontend section) - Fix override_tp4: remove erroneous fp4-gemm-backend: null (original tp4 file keeps flashinfer_trtllm backend), add decode expert-parallel-size: 1 - Separate low-lat and max-tpt into distinct zip_override_ groups so each carries appropriate sglang_config overrides (DP=8, moe-dense-tp-size, etc.) - FP4 1k1k MTP max-tpt: add per-variant mem-fraction-static list [0.75, 0.85] - FP8 MTP max-tpt: keep max-running-requests=512 (STP raises to 1024, MTP does not) - FP8 1k1k MTP: add override_maxtpt_1p2d special case with spec-steps=1, draft-tokens=2 Core fix: - generate_override_configs: respect explicit name: field in override_* dicts instead of always auto-generating {base_name}_{suffix}; add test coverage

Consolidate 8 separate *-stp.yaml / *-mtp.yaml files into 4 combined files (b200-fp8/1k1k.yaml, b200-fp8/8k1k.yaml, b200-fp4/1k1k.yaml, b200-fp4/8k1k.yaml). Override key names include stp/mtp labels (zip_override_stp_lowlat, zip_override_mtp_maxtpt, etc.) enabling wildcard selectors: srtctl apply -f recipes/b200-fp8/1k1k.yaml:*stp* # all STP variants srtctl apply -f recipes/b200-fp8/1k1k.yaml:*mtp* # all MTP variants FP4 8k1k 7p2d uses the null mechanism to combine STP and MTP into one zip_override_maxtpt_7p2d section: null in STP slots is a no-op (keys absent from base); values in MTP slots add SGLANG_ENABLE_SPEC_V2 and speculative settings on top of the same resources/sglang config.

…om old files - Replace zip_override_maxtpt_7p2d (null-mechanism combined) with explicit override_stp_maxtpt_7p2d and override_mtp_maxtpt_7p2d in b200-fp4/8k1k.yaml - Verified all 4 combined files produce configs identical to the original individual stp/mtp files (compared with Python deep-diff, excluding name field) - Add scale-sweep notes and backend notes from old files as section comments

Adds parameter grouping comments (# Model configuration, # Disaggregation mode, # Memory and token limits, # Parallelism, # Attention, # MoE, # Other flags) to the base sglang_config blocks in all four combined recipe files, matching the originals in the per-variant subdirectories. Also preserves commented-out hints (# moe-dense-tp-size: 1, # disable-chunked-prefix-cache: true) from the original files. All 40 variants verified equivalent to originals via diff script.

The 40 individual stp/mtp YAML files under b200-fp8/1k1k/, b200-fp8/8k1k/, b200-fp4/1k1k/, and b200-fp4/8k1k/ are now consolidated into 4 combined recipe files (one per precision×isl). All variants verified equivalent via diff script before deletion.

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

weireweire changed the title ~~Consolidate B200 recipes: merge per-variant STP/MTP files into 4 combined files~~ Consolidate B200 recipes: merge 40 per-variant STP/MTP files into 4 combined files Mar 4, 2026

weiliangl added 7 commits March 4, 2026 06:03

fix: correct variant count comment in b200-fp4/8k1k.yaml (11 not 12)

42af38e

weireweire force-pushed the simplify/b200-recipes branch from ac84b66 to 42af38e Compare March 4, 2026 06:04

weireweire merged commit 37f8ca2 into main Mar 4, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate B200 recipes: merge 40 per-variant STP/MTP files into 4 combined files#206

Consolidate B200 recipes: merge 40 per-variant STP/MTP files into 4 combined files#206
weireweire merged 7 commits intomainfrom
simplify/b200-recipes

weireweire commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

coderabbitai bot Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 12 variants
	# srtctl apply -f recipes/b200-fp4/8k1k.yaml # all 11 variants

Conversation

weireweire commented Mar 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weireweire commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading