Skip to content

Remove sharding docs#872

Merged
Kipok merged 1 commit intomainfrom
smahdavi/sharding-docs
Sep 30, 2025
Merged

Remove sharding docs#872
Kipok merged 1 commit intomainfrom
smahdavi/sharding-docs

Conversation

@smahdavi4
Copy link
Collaborator

@smahdavi4 smahdavi4 commented Sep 30, 2025

Summary by CodeRabbit

  • Documentation

    • Removed obsolete sharded model conversion section.
    • Updated dataset guides to use the non-sharded DeepSeek-R1 path and revised server arguments to reflect the new setup.
  • Chores

    • Updated solution generation configs to use the non-sharded DeepSeek-R1 path.
    • Switched server arguments to an explicit parallelism setting, aligning configs with the updated runtime.
    • Simplifies setup and removes reliance on sharded model artifacts.

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
@smahdavi4 smahdavi4 requested a review from Kipok September 30, 2025 22:39
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Walkthrough

Documentation removes sharded conversion instructions and updates example model paths and server args. Configs switch model paths from pre-sharded “tp16” variants to unsharded paths and replace “--load-format sharded_state” with “--ep-size 16” while preserving context lengths.

Changes

Cohort / File(s) Summary
Docs: Open Math Reasoning
docs/releases/openmathreasoning/dataset.md
Removed the “Model conversion” section describing sharded DeepSeek-R1 state conversion; remaining sections unchanged.
Docs: Open Reasoning
docs/releases/openreasoning/dataset.md
Removed optional sharding block; updated example model path from /workspace/DeepSeek-R1-0528-tp16 to /workspace/DeepSeek-R1-0528; changed server_args from --load-format sharded_state --context-length {…} to --ep-size 16 --context-length {…}.
Configs: Open Code Reasoning R1
recipes/opencodereasoning/configs/solution_sdg/r1.yaml
In generate_solutions.stage_kwargs: model path /hf_models/DeepSeek-R1-tp16/hf_models/DeepSeek-R1; server_args --load-format sharded_state --context-length 32768--ep-size 16 --context-length 32768.
Configs: Open Math Reasoning R1
recipes/openmathreasoning/configs/solution_sdg/r1.yaml
In generate_solutions.stage_kwargs: model path /hf_models/DeepSeek-R1-tp16/hf_models/DeepSeek-R1; server_args --load-format sharded_state --context-length 18384--ep-size 16 --context-length 18384.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nudge my whiskers at config lines,
Hop from “tp16” to simpler signs.
No shards to juggle, neat and clean—
Just ep-size 16 in the scene.
Carrots compiled, docs aligned;
A tidy trail for buns behind. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the primary change in the pull request, which is the removal of sharding-related documentation, making it clear to reviewers without extraneous detail.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch smahdavi/sharding-docs

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e342ee8 and 4ed7f6c.

📒 Files selected for processing (4)
  • docs/releases/openmathreasoning/dataset.md (0 hunks)
  • docs/releases/openreasoning/dataset.md (1 hunks)
  • recipes/opencodereasoning/configs/solution_sdg/r1.yaml (1 hunks)
  • recipes/openmathreasoning/configs/solution_sdg/r1.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/releases/openmathreasoning/dataset.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (3)
recipes/openmathreasoning/configs/solution_sdg/r1.yaml (1)

53-53: LGTM! Model path and server args updated correctly.

The changes correctly switch from pre-sharded checkpoint loading to unsharded model with explicit parallelism. The --ep-size 16 matches the total GPU count (8 GPUs × 2 nodes), and the context length is properly preserved.

Also applies to: 57-57

recipes/opencodereasoning/configs/solution_sdg/r1.yaml (1)

36-36: LGTM! Consistent model path and server args updates.

The changes follow the same pattern as the math config, correctly switching to unsharded model loading with --ep-size 16 (matching 8 GPUs × 2 nodes). The higher context length of 32768 is appropriate for code generation tasks.

Also applies to: 40-40

docs/releases/openreasoning/dataset.md (1)

69-69: LGTM – docs/releases/openreasoning/dataset.md is free of any -tp16 or sharded_state references.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@Kipok Kipok merged commit a37231c into main Sep 30, 2025
6 checks passed
@Kipok Kipok deleted the smahdavi/sharding-docs branch September 30, 2025 22:48
wasiahmad pushed a commit that referenced this pull request Oct 1, 2025
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
SeanNaren pushed a commit that referenced this pull request Oct 9, 2025
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants