Fix context-length and separate long running benchmark by kyleliang-nv · Pull Request #97 · ishandhanani/srt-slurm

kyleliang-nv · 2026-01-24T00:50:39Z

Summary by CodeRabbit

New Features
- Added Dynamo frontend support (v0.7.0) with multiple frontend configuration options
- Introduced new H200 GPU deployment configuration templates with various batch sizes and optimization profiles
Improvements
- Updated model container versions to latest stable releases
- Increased maximum context length support from 9200 to 10000 tokens in GB200 configurations
- Enhanced installation reliability with improved package management
Chores
- Updated .gitignore patterns for generated configurations and caches

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-24T00:50:57Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR updates model serving configurations across GB200 and H200 GPU setups, transitions to SGLang v0.5.5.post2/v0.5.8, adds Dynamo frontend support, increases context lengths from 9200 to 10000, and removes deprecated backend options. Minor code adjustments enable Dynamo installation with package compatibility and frontend infrastructure changes.

Changes

Cohort / File(s)	Summary
GB200 FP4 Config Updates `recipies/gb200-fp4/1k8k/low-latency.yaml`, `max-tpt.yaml`, `mid-curve.yaml`	Updated model container to v0.5.5.post2, added dynamo section (v0.7.0) with frontend configuration and multiple frontends, increased context-length to 10000, removed disaggregation-transfer-backend and fp4-gemm-backend entries, removed/cleaned up feature flag comments
New H200 Config Files (1k1k) `recipies/h200/1k1k/bs128-agg-tp.yaml`, `bs256-1p6d-dep.yaml`, `bs256-1p6d-tp.yaml`, `low-latency-1p9d.yaml`	Added four new H200 deployment configurations with FP8 precision, detailed sglang_config for prefill/decode modes, disaggregation settings, and benchmark parameters
New H200 Config Files (8k1k) `recipies/h200/8k1k/bs128-1p1d-dep.yaml`, `bs128-agg-tp.yaml`, `bs16-1p3d.yaml`, `bs4-1p7d.yaml`, `bs64-2p3d.yaml`, `bs8-1p6d.yaml`	Added six new H200 deployment configurations with varying batch sizes and parallelism strategies (1p1d, agg-tp, 1p3d, 1p7d, 2p3d, 1p6d) with FP8 precision and consistent deployment patterns
Dynamo/Install Updates `src/srtctl/core/schema.py`	Added `--break-system-packages` flag to Dynamo pip install invocation; changed RUSTFLAGS export from double-quoted to single-quoted syntax for source builds
Frontend Infrastructure `src/srtctl/cli/mixins/frontend_stage.py`	Added `container-remap-root` empty string to srun_options in _start_nginx call
Code Cleanup `src/srtctl/backends/trtllm.py`, `src/srtctl/cli/mixins/worker_stage.py`	Added trailing comma to trtllm argument list; removed blank line in worker_stage environment templating
Gitignore Updates `.gitignore`	Added ignore patterns: `configs/dg-`, `configs/flashinfer-cache/`, `outputs/`

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

basic updates to qwen recipie #80 — Modifies Dynamo install commands and RUSTFLAGS export in schema.py using identical code patterns.
sgl-router and docs #61 — Introduces sglang-router and multi-frontend support (enable_multiple_frontends, num_additional_frontends) that align with this PR's dynamo frontend configuration additions.
Add h200 config. Fix nginx/dynamo install issue. #82 — Shares code-level changes to _start_nginx in frontend_stage.py and Dynamo install commands in schema.py.

Suggested reviewers

ishandhanani

Poem

🐰 With configs now set to ten thousand and more,
And Dynamo's frontends spread wide at the door,
H200s dance forward in formations so neat,
While v0.5.5 keeps the inference sweet—
A recipe feast for the GPU fleet! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main changes: context-length fixes and separation of benchmark configurations into new files with different concurrency levels.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

* add h200 config * use bf16 kvcache + tp * fix the pythong and apt install permission issue. * use 2x prompts warmup and 10x for test. add gitignore * only keep best config * use nixl to use in cuda13 * add 1k1k config * revert print container-remap-root arg * modify container name --------- Co-authored-by: weireweire <weiliangl@login-1>

kyleliang-nv · 2026-01-27T02:15:40Z

ugh...I think I messed up this PR. Going to start a clean one.

Fix context-length and separate long running benchmark

2433f2d

ishandhanani and others added 6 commits January 26, 2026 12:15

Merge branch 'main' into kylliang/separate_per_concu_config

54563fc

rebase

f83faf3

lint

39eac99

Update configs to use 0.5.5.post2

07ec9a0

Remove the separate max-tput config

93a1f61

kyleliang-nv closed this Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix context-length and separate long running benchmark#97

Fix context-length and separate long running benchmark#97
kyleliang-nv wants to merge 7 commits intomainfrom
kylliang/separate_per_concu_config

kyleliang-nv commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

kyleliang-nv commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kyleliang-nv commented Jan 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

kyleliang-nv commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kyleliang-nv commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 24, 2026 •

edited

Loading