Remove sharding docs by smahdavi4 · Pull Request #872 · NVIDIA-NeMo/Skills

smahdavi4 · 2025-09-30T22:39:27Z

Summary by CodeRabbit

Documentation
- Removed obsolete sharded model conversion section.
- Updated dataset guides to use the non-sharded DeepSeek-R1 path and revised server arguments to reflect the new setup.
Chores
- Updated solution generation configs to use the non-sharded DeepSeek-R1 path.
- Switched server arguments to an explicit parallelism setting, aligning configs with the updated runtime.
- Simplifies setup and removes reliance on sharded model artifacts.

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai · 2025-09-30T22:39:35Z

Walkthrough

Documentation removes sharded conversion instructions and updates example model paths and server args. Configs switch model paths from pre-sharded “tp16” variants to unsharded paths and replace “--load-format sharded_state” with “--ep-size 16” while preserving context lengths.

Changes

Cohort / File(s)	Summary
Docs: Open Math Reasoning `docs/releases/openmathreasoning/dataset.md`	Removed the “Model conversion” section describing sharded DeepSeek-R1 state conversion; remaining sections unchanged.
Docs: Open Reasoning `docs/releases/openreasoning/dataset.md`	Removed optional sharding block; updated example model path from `/workspace/DeepSeek-R1-0528-tp16` to `/workspace/DeepSeek-R1-0528`; changed `server_args` from `--load-format sharded_state --context-length {…}` to `--ep-size 16 --context-length {…}`.
Configs: Open Code Reasoning R1 `recipes/opencodereasoning/configs/solution_sdg/r1.yaml`	In `generate_solutions.stage_kwargs`: model path `/hf_models/DeepSeek-R1-tp16` → `/hf_models/DeepSeek-R1`; `server_args` `--load-format sharded_state --context-length 32768` → `--ep-size 16 --context-length 32768`.
Configs: Open Math Reasoning R1 `recipes/openmathreasoning/configs/solution_sdg/r1.yaml`	In `generate_solutions.stage_kwargs`: model path `/hf_models/DeepSeek-R1-tp16` → `/hf_models/DeepSeek-R1`; `server_args` `--load-format sharded_state --context-length 18384` → `--ep-size 16 --context-length 18384`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nudge my whiskers at config lines,
Hop from “tp16” to simpler signs.
No shards to juggle, neat and clean—
Just ep-size 16 in the scene.
Carrots compiled, docs aligned;
A tidy trail for buns behind. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely describes the primary change in the pull request, which is the removal of sharding-related documentation, making it clear to reviewers without extraneous detail.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch smahdavi/sharding-docs

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e342ee8 and 4ed7f6c.

📒 Files selected for processing (4)

docs/releases/openmathreasoning/dataset.md (0 hunks)
docs/releases/openreasoning/dataset.md (1 hunks)
recipes/opencodereasoning/configs/solution_sdg/r1.yaml (1 hunks)
recipes/openmathreasoning/configs/solution_sdg/r1.yaml (1 hunks)

💤 Files with no reviewable changes (1)

docs/releases/openmathreasoning/dataset.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (3)

recipes/openmathreasoning/configs/solution_sdg/r1.yaml (1)

53-53: LGTM! Model path and server args updated correctly.

The changes correctly switch from pre-sharded checkpoint loading to unsharded model with explicit parallelism. The --ep-size 16 matches the total GPU count (8 GPUs × 2 nodes), and the context length is properly preserved.

Also applies to: 57-57

recipes/opencodereasoning/configs/solution_sdg/r1.yaml (1)

36-36: LGTM! Consistent model path and server args updates.

The changes follow the same pattern as the math config, correctly switching to unsharded model loading with --ep-size 16 (matching 8 GPUs × 2 nodes). The higher context length of 32768 is appropriate for code generation tasks.

Also applies to: 40-40

docs/releases/openreasoning/dataset.md (1)

69-69: LGTM – docs/releases/openreasoning/dataset.md is free of any -tp16 or sharded_state references.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Kipok

thanks!

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Remove sharding docs as expert parallel is fast enough

4ed7f6c

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 requested a review from Kipok September 30, 2025 22:39

Kipok approved these changes Sep 30, 2025

View reviewed changes

Kipok merged commit a37231c into main Sep 30, 2025
6 checks passed

Kipok deleted the smahdavi/sharding-docs branch September 30, 2025 22:48

wasiahmad pushed a commit that referenced this pull request Oct 1, 2025

Remove sharding docs (#872)

bdf6b37

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025

Remove sharding docs (NVIDIA-NeMo#872)

9f4c02e

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

SeanNaren pushed a commit that referenced this pull request Oct 9, 2025

Remove sharding docs (#872)

49ef017

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Remove sharding docs (#872)

6f26d7a

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove sharding docs#872

Remove sharding docs#872
Kipok merged 1 commit intomainfrom
smahdavi/sharding-docs

smahdavi4 commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

Kipok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smahdavi4 commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smahdavi4 commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading