Skip to content

Fix context-length and separate long running benchmark#97

Closed
kyleliang-nv wants to merge 7 commits intomainfrom
kylliang/separate_per_concu_config
Closed

Fix context-length and separate long running benchmark#97
kyleliang-nv wants to merge 7 commits intomainfrom
kylliang/separate_per_concu_config

Conversation

@kyleliang-nv
Copy link
Copy Markdown
Collaborator

@kyleliang-nv kyleliang-nv commented Jan 24, 2026

Summary by CodeRabbit

  • New Features

    • Added Dynamo frontend support (v0.7.0) with multiple frontend configuration options
    • Introduced new H200 GPU deployment configuration templates with various batch sizes and optimization profiles
  • Improvements

    • Updated model container versions to latest stable releases
    • Increased maximum context length support from 9200 to 10000 tokens in GB200 configurations
    • Enhanced installation reliability with improved package management
  • Chores

    • Updated .gitignore patterns for generated configurations and caches

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 24, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR updates model serving configurations across GB200 and H200 GPU setups, transitions to SGLang v0.5.5.post2/v0.5.8, adds Dynamo frontend support, increases context lengths from 9200 to 10000, and removes deprecated backend options. Minor code adjustments enable Dynamo installation with package compatibility and frontend infrastructure changes.

Changes

Cohort / File(s) Summary
GB200 FP4 Config Updates
recipies/gb200-fp4/1k8k/low-latency.yaml, max-tpt.yaml, mid-curve.yaml
Updated model container to v0.5.5.post2, added dynamo section (v0.7.0) with frontend configuration and multiple frontends, increased context-length to 10000, removed disaggregation-transfer-backend and fp4-gemm-backend entries, removed/cleaned up feature flag comments
New H200 Config Files (1k1k)
recipies/h200/1k1k/bs128-agg-tp.yaml, bs256-1p6d-dep.yaml, bs256-1p6d-tp.yaml, low-latency-1p9d.yaml
Added four new H200 deployment configurations with FP8 precision, detailed sglang_config for prefill/decode modes, disaggregation settings, and benchmark parameters
New H200 Config Files (8k1k)
recipies/h200/8k1k/bs128-1p1d-dep.yaml, bs128-agg-tp.yaml, bs16-1p3d.yaml, bs4-1p7d.yaml, bs64-2p3d.yaml, bs8-1p6d.yaml
Added six new H200 deployment configurations with varying batch sizes and parallelism strategies (1p1d, agg-tp, 1p3d, 1p7d, 2p3d, 1p6d) with FP8 precision and consistent deployment patterns
Dynamo/Install Updates
src/srtctl/core/schema.py
Added --break-system-packages flag to Dynamo pip install invocation; changed RUSTFLAGS export from double-quoted to single-quoted syntax for source builds
Frontend Infrastructure
src/srtctl/cli/mixins/frontend_stage.py
Added container-remap-root empty string to srun_options in _start_nginx call
Code Cleanup
src/srtctl/backends/trtllm.py, src/srtctl/cli/mixins/worker_stage.py
Added trailing comma to trtllm argument list; removed blank line in worker_stage environment templating
Gitignore Updates
.gitignore
Added ignore patterns: configs/dg-*, configs/flashinfer-cache/, outputs/*

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • ishandhanani

Poem

🐰 With configs now set to ten thousand and more,
And Dynamo's frontends spread wide at the door,
H200s dance forward in formations so neat,
While v0.5.5 keeps the inference sweet—
A recipe feast for the GPU fleet! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: context-length fixes and separation of benchmark configurations into new files with different concurrency levels.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ishandhanani and others added 6 commits January 26, 2026 12:15
* add h200 config

* use bf16 kvcache + tp

* fix the pythong and apt install permission issue.

* use 2x prompts warmup and 10x for test. add gitignore

* only keep best config

* use nixl to use in cuda13

* add 1k1k config

* revert print container-remap-root arg

* modify container name

---------

Co-authored-by: weireweire <weiliangl@login-1>
@kyleliang-nv
Copy link
Copy Markdown
Collaborator Author

ugh...I think I messed up this PR. Going to start a clean one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants