Add GB200 FP8 1k/8k configs by kyleliang-nv · Pull Request #115 · ishandhanani/srt-slurm

kyleliang-nv · 2026-01-28T21:51:42Z

Summary by CodeRabbit

New Features
- Added three deployment profiles for gb200-fp8 (low-latency, maximum-throughput, mid-curve) to choose optimized inference behavior.
- Each profile includes tuned runtime and memory/performance controls, separate prefill vs decode optimizations, and built-in benchmarking settings to evaluate latency and throughput.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-28T21:52:06Z

📝 Walkthrough

Walkthrough

Adds three new deployment YAMLs for gb200-fp8 1k/8k: low-latency, max-tpt, and mid-curve. Each file configures Dynamo frontend topology, model/container/precision, resource allocations, and detailed SGLang backend settings for separate prefill and decode modes plus benchmark parameters.

Changes

Cohort / File(s)	Summary
GB200-FP8 1K/8K Configuration Files `recipes/gb200-fp8/1k8k/low-latency.yaml`, `recipes/gb200-fp8/1k8k/max-tpt.yaml`, `recipes/gb200-fp8/1k8k/mid-curve.yaml`	Added three complete deployment configs. Each defines Dynamo frontend (multi-frontend support), model metadata (path, container, precision), resource allocations (gpu_type, prefill/decode nodes/workers, gpus_per_node), and extensive SGLang backend configs split for prefill vs decode (served-model-name, trust-remote-code, kv-cache-dtype, attention/backend, quantization, moa/runner, disaggregation-mode, tensor/data/expert parallelism, mem-fraction, CUDA-graph and DeepEP/MOE tuning). Includes benchmark blocks (concurrency, req_rate, sa-bench) and numerous performance/timeout/cache settings requiring validation across prefill/decode and resource params.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

ishandhanani

Poem

🐇 A rabbit hops through YAML fields bright,
Low-latency, max-tpt, mid-curve take flight,
Prefill and decode in careful array,
SGLang whispers tweaks through night and day,
Hooray for configs tuned just right! 🥕✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding three new YAML configuration files (low-latency.yaml, max-tpt.yaml, mid-curve.yaml) for GB200 FP8 1k/8k deployment specifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@recipes/gb200-fp8/1k8k/low-latency.yaml`:
- Line 9: The inline comment for the YAML key num_additional_frontends is
truncated; update the comment to a complete sentence clarifying the meaning
(e.g., complete the fragment "# Additional routers (total = 1 + t" to something
like "# Additional routers (total = 1 + num_additional_frontends)" or a
similarly clear description) so anyone reading the key understands how the total
router count is computed; locate the num_additional_frontends entry and replace
the truncated comment with the full explanatory text.
- Line 29: Update the SGLANG_DG_CACHE_DIR value to use the absolute path to
match the other recipes and avoid working-directory dependent behavior: locate
the SGLANG_DG_CACHE_DIR entry in this file
(recipes/gb200-fp8/1k8k/low-latency.yaml) and change the value from
"configs/dg-0.5.5.post2" to "/configs/dg-0.5.5.post2" so it is consistent with
mid-curve.yaml and max-tpt.yaml.
- Line 47: Update the SGLANG_DG_CACHE_DIR value in this file to use the same
corrected path used in decode_environment across the other config files; locate
the SGLANG_DG_CACHE_DIR entry and replace its current path with the exact
canonical path string used elsewhere so the decode_environment lookup is
consistent with the other configurations.

In `@recipes/gb200-fp8/1k8k/max-tpt.yaml`:
- Line 12: The inline comment for num_additional_frontends is truncated; update
the comment to a complete explanatory sentence such as "Additional routers
(total = 1 + num_additional_frontends)" or "Additional routers (total routers =
1 + num_additional_frontends)" next to the num_additional_frontends key so the
intent is clear.

In `@recipes/gb200-fp8/1k8k/mid-curve.yaml`:
- Line 12: The inline comment for the YAML key num_additional_frontends is
truncated; update the comment for num_additional_frontends to complete the
explanatory text (e.g., "Additional routers (total = 1 +
num_additional_frontends)") so it clearly states how the total routers is
computed and what the value represents.

Add GB200 FP8 1k/8k configs

9dd5260

kyleliang-nv requested a review from ishandhanani January 28, 2026 21:51

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

kyleliang-nv added 2 commits January 28, 2026 21:05

Update DG_CACHE

8e425af

Remove comment

d6ea721

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GB200 FP8 1k/8k configs#115

Add GB200 FP8 1k/8k configs#115
kyleliang-nv wants to merge 3 commits intomainfrom
kylliang/gb200-fp8-1k8k

kyleliang-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kyleliang-nv commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kyleliang-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading