Stepheng/nemotron math proofs docs by stephencge · Pull Request #1111 · NVIDIA-NeMo/Skills

stephencge · 2025-12-15T19:24:55Z

Summary by CodeRabbit

Release Notes

Documentation
- Added comprehensive documentation for the Nemotron-Math-Proofs dataset, including training and evaluation results with detailed performance metrics.
- Included step-by-step reproducible workflows and practical code examples for autoformalization, theorem proving, model training, and evaluation.
- Updated site navigation to include the new documentation page with relevant dataset information and helpful resources.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-15T19:28:25Z

📝 Walkthrough

Walkthrough

A new documentation page for the Nemotron-Math-Proofs dataset is added, detailing training and evaluation results, dataset fields, and reproducible workflows. The navigation configuration is updated to include this new page.

Changes

Cohort / File(s)	Change Summary
Documentation Addition `docs/releases/nemotronmathproofs/index.md`	New documentation page describing the Nemotron-Math-Proofs dataset, including training/evaluation results, dataset fields, CLI and Python code snippets for autoformalization, theorem proving, model training, evaluation workflows, and known limitations.
Navigation Update `mkdocs.yml`	Navigation entry added under Papers & Releases for Nemotron-Math-Proofs linking to the new documentation page.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Documentation additions are straightforward to review
Navigation update is a simple entry addition with no complex configuration changes
No code logic, control flow, or dependencies to verify

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding documentation for the Nemotron-Math-Proofs dataset, which is the primary purpose of this pull request.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch stepheng/nemotron-math-proofs-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

docs/releases/nemotronmathproofs/index.md (1)
380-384: Minor code-block style inconsistency.

Line 382 uses a fenced code block (triple backticks) in a section that could be indented instead. While this is a minor linting issue from markdownlint-cli2 (MD046), consider standardizing the code-block style for consistency with other documentation.

Consider applying this diff to use indented code block style instead:
-To summarize evaluation results:
-
-```bash
+To summarize evaluation results:
+
+    ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm
-ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm
-```
Alternatively, if you prefer fenced blocks, you can suppress this specific MD046 rule in your markdownlint configuration if it conflicts with your documentation style guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c819ca4 and e88a940.

📒 Files selected for processing (2)

docs/releases/nemotronmathproofs/index.md (1 hunks)
mkdocs.yml (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/releases/nemotronmathproofs/index.md

382-382: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (10)

mkdocs.yml (1)

94-94: LGTM!

The navigation entry follows the established pattern and correctly references the new documentation page.

docs/releases/nemotronmathproofs/index.md (9)

1-3: Metadata and structure look good.

The YAML front matter is properly formatted with the PR creation date, and the document structure follows Markdown best practices.

5-25: Dataset overview is comprehensive and well-presented.

The gpt-oss-120b model is OpenAI's open-weight model designed for powerful reasoning, making it an appropriate choice for autoformalization. Goedel-Prover-V2-32B achieves state-of-the-art performance on MiniF2F at Pass@32, validating the model choices presented here. The statistics and dataset composition are clearly documented.

27-42: Training results tables are clearly presented.

The comparative results effectively demonstrate the performance improvements achieved by training on this dataset. The formatting with separate tables for the SFT results and Nemotron-Nano-v3 results aids readability.

44-56: Dataset fields table is well-documented.

Each field is clearly described with its type and purpose, including notes about optional/nullable fields. This information will help users understand the dataset structure and guide their data loading pipelines.

58-142: Autoformalization workflow section is detailed and reproducible.

The paired CLI and Python examples provide clear instructions for users with different workflow preferences. The pipeline components are well-explained with concrete steps for each stage (initial formalization, compilation check, backtranslation, iterative refinement).

143-214: Theorem proving workflow is well-documented with clear configuration.

The section effectively explains the proving strategy and provides practical examples. The configuration parameters are explained through their role in the pipeline (e.g., chain-of-thought removal, wrong turn deletion, structured error feedback).

215-273: Model training section provides practical guidance.

Both CLI and Python variants are provided, making it accessible to users with different preferences. The training parameters are appropriate for fine-tuning on mathematical reasoning tasks.

274-385: Evaluation section is comprehensive with multiple scenarios.

The documentation covers standard evaluation, self-correction mode, and result summarization. This gives users flexibility in how they evaluate their models based on their specific needs and computational budgets.

386-390: Known limitations are transparently documented.

The section honestly acknowledges three key limitations with brief explanations. This transparency is appreciated and helps set user expectations appropriately.

Kipok · 2025-12-15T19:32:07Z

docs/releases/nemotronmathproofs/index.md

+| gpt-oss-20b | 43.03% | - |
+| Qwen3-30B-A3B-Thinking | 16.80% | - |
+
+## Dataset Fields


probably don't need this section, I think this can just stay in hf

Kipok · 2025-12-15T19:33:02Z

docs/releases/nemotronmathproofs/index.md

+    These commands assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md).
+    Adjust paths and cluster settings according to your environment.
+
+### Autoformalization


not sure if it's going to be better, but consider splitting into multiple pages and linking here

Kipok · 2025-12-15T19:34:11Z

docs/releases/nemotronmathproofs/index.md

+    generate(
+        ctx=wrap_arguments(
+            "++inference.tokens_to_generate=120000 "
+            "++prompt_config=lean4/deepseek-R1-autoformalization "


should we rename these prompts? Why r1 if we use with gpt-oss, it's a bit confusing

Kipok · 2025-12-15T19:35:23Z

docs/releases/nemotronmathproofs/index.md

+        ++refinement=True \
+        ++refinement_max_turns=8 \
+        ++remove_cot=True \
+        ++n_pass=1 \


should this be 4? Or mention how to get to pass@4 since that's what we are releasing

Kipok · 2025-12-15T19:36:07Z

docs/releases/nemotronmathproofs/index.md

+        server_gpus=8,
+        server_args="--max-model-len 40960",
+        num_random_seeds=1,
+        dependent_jobs=2,


things like this we should typically not include as this is very cluster specific. Can just mention that it can take a while, so if users have timeouts, they can set this or num chunks

Kipok · 2025-12-15T19:36:54Z

docs/releases/nemotronmathproofs/index.md

+        --expname=qwen3-8b-lean-sft \
+        --output_dir=/workspace/training/qwen3-8b-lean-sft \
+        --hf_model=Qwen/Qwen3-8B \
+        --training_data=/workspace/data/sft_data.jsonl \  # Processed output from theorem proving step


We should ideally add instructions with a simple command for how to construct this data using the release HF dataset. Probably just a few lines to download / lightly postprocess. See other releases for examples

ideally also add a step to prepare the output of theorem proving into this format, but it's less important than how to use existing dataset

Signed-off-by: Stephen Ge <stepheng@nvidia.com>

…PR comments Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Kipok

Let's merge the current version and we can refine later if there are issues

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: dgitman <dgitman@nvidia.com>

stephencge requested review from Kipok and gwarmstrong December 15, 2025 19:24

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

Kipok reviewed Dec 15, 2025

View reviewed changes

stephencge added 3 commits December 15, 2025 16:06

first draft of mathproofs docs

efaf009

Signed-off-by: Stephen Ge <stepheng@nvidia.com>

math proofs docs

fed4557

Signed-off-by: Stephen Ge <stepheng@nvidia.com>

add self improvement scores for comparable model, cleanup/respond to …

e58c8c3

…PR comments Signed-off-by: Stephen Ge <stepheng@nvidia.com>

stephencge force-pushed the stepheng/nemotron-math-proofs-docs branch from e88a940 to e58c8c3 Compare December 16, 2025 00:25

revert erroneous n_pass change in sc eval

0e5ce0f

Signed-off-by: Stephen Ge <stepheng@nvidia.com>

stephencge enabled auto-merge (squash) December 16, 2025 00:39

Kipok approved these changes Dec 16, 2025

View reviewed changes

stephencge merged commit e393e48 into main Dec 16, 2025
5 checks passed

stephencge deleted the stepheng/nemotron-math-proofs-docs branch December 16, 2025 02:10

coderabbitai bot mentioned this pull request Dec 18, 2025

update paper link, references to dataset, self-correction differences #1129

Merged

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Stepheng/nemotron math proofs docs (#1111)

2da6153

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Stepheng/nemotron math proofs docs (#1111)

67fbc84

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

Stepheng/nemotron math proofs docs (#1111)

218a0ac

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Stepheng/nemotron math proofs docs (#1111)

3ea7a17

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Stepheng/nemotron math proofs docs (#1111)

645a8a1

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Stepheng/nemotron math proofs docs (#1111)

7458160

Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stepheng/nemotron math proofs docs#1111

Stepheng/nemotron math proofs docs#1111
stephencge merged 4 commits intomainfrom
stepheng/nemotron-math-proofs-docs

stephencge commented Dec 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 15, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok Dec 15, 2025

Uh oh!

Kipok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stephencge commented Dec 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 15, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stephencge commented Dec 15, 2025 •

edited by coderabbitai bot

Loading