Conversation
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
WalkthroughDocumentation removes sharded conversion instructions and updates example model paths and server args. Configs switch model paths from pre-sharded “tp16” variants to unsharded paths and replace “--load-format sharded_state” with “--ep-size 16” while preserving context lengths. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit
Documentation
Chores