Fix context-length and separate long running benchmark#97
Fix context-length and separate long running benchmark#97kyleliang-nv wants to merge 7 commits intomainfrom
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR updates model serving configurations across GB200 and H200 GPU setups, transitions to SGLang v0.5.5.post2/v0.5.8, adds Dynamo frontend support, increases context lengths from 9200 to 10000, and removes deprecated backend options. Minor code adjustments enable Dynamo installation with package compatibility and frontend infrastructure changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
* add h200 config * use bf16 kvcache + tp * fix the pythong and apt install permission issue. * use 2x prompts warmup and 10x for test. add gitignore * only keep best config * use nixl to use in cuda13 * add 1k1k config * revert print container-remap-root arg * modify container name --------- Co-authored-by: weireweire <weiliangl@login-1>
|
ugh...I think I messed up this PR. Going to start a clean one. |
Summary by CodeRabbit
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.