Skip to content

Stepheng/nemotron math proofs docs#1111

Merged
stephencge merged 4 commits intomainfrom
stepheng/nemotron-math-proofs-docs
Dec 16, 2025
Merged

Stepheng/nemotron math proofs docs#1111
stephencge merged 4 commits intomainfrom
stepheng/nemotron-math-proofs-docs

Conversation

@stephencge
Copy link
Collaborator

@stephencge stephencge commented Dec 15, 2025

Summary by CodeRabbit

Release Notes

  • Documentation
    • Added comprehensive documentation for the Nemotron-Math-Proofs dataset, including training and evaluation results with detailed performance metrics.
    • Included step-by-step reproducible workflows and practical code examples for autoformalization, theorem proving, model training, and evaluation.
    • Updated site navigation to include the new documentation page with relevant dataset information and helpful resources.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

📝 Walkthrough

Walkthrough

A new documentation page for the Nemotron-Math-Proofs dataset is added, detailing training and evaluation results, dataset fields, and reproducible workflows. The navigation configuration is updated to include this new page.

Changes

Cohort / File(s) Change Summary
Documentation Addition
docs/releases/nemotronmathproofs/index.md
New documentation page describing the Nemotron-Math-Proofs dataset, including training/evaluation results, dataset fields, CLI and Python code snippets for autoformalization, theorem proving, model training, evaluation workflows, and known limitations.
Navigation Update
mkdocs.yml
Navigation entry added under Papers & Releases for Nemotron-Math-Proofs linking to the new documentation page.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Documentation additions are straightforward to review
  • Navigation update is a simple entry addition with no complex configuration changes
  • No code logic, control flow, or dependencies to verify

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding documentation for the Nemotron-Math-Proofs dataset, which is the primary purpose of this pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch stepheng/nemotron-math-proofs-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/releases/nemotronmathproofs/index.md (1)

380-384: Minor code-block style inconsistency.

Line 382 uses a fenced code block (triple backticks) in a section that could be indented instead. While this is a minor linting issue from markdownlint-cli2 (MD046), consider standardizing the code-block style for consistency with other documentation.

Consider applying this diff to use indented code block style instead:

-To summarize evaluation results:
-
-```bash
+To summarize evaluation results:
+
+    ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm
-ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm
-```

Alternatively, if you prefer fenced blocks, you can suppress this specific MD046 rule in your markdownlint configuration if it conflicts with your documentation style guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c819ca4 and e88a940.

📒 Files selected for processing (2)
  • docs/releases/nemotronmathproofs/index.md (1 hunks)
  • mkdocs.yml (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/releases/nemotronmathproofs/index.md

382-382: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (10)
mkdocs.yml (1)

94-94: LGTM!

The navigation entry follows the established pattern and correctly references the new documentation page.

docs/releases/nemotronmathproofs/index.md (9)

1-3: Metadata and structure look good.

The YAML front matter is properly formatted with the PR creation date, and the document structure follows Markdown best practices.


5-25: Dataset overview is comprehensive and well-presented.

The gpt-oss-120b model is OpenAI's open-weight model designed for powerful reasoning, making it an appropriate choice for autoformalization. Goedel-Prover-V2-32B achieves state-of-the-art performance on MiniF2F at Pass@32, validating the model choices presented here. The statistics and dataset composition are clearly documented.


27-42: Training results tables are clearly presented.

The comparative results effectively demonstrate the performance improvements achieved by training on this dataset. The formatting with separate tables for the SFT results and Nemotron-Nano-v3 results aids readability.


44-56: Dataset fields table is well-documented.

Each field is clearly described with its type and purpose, including notes about optional/nullable fields. This information will help users understand the dataset structure and guide their data loading pipelines.


58-142: Autoformalization workflow section is detailed and reproducible.

The paired CLI and Python examples provide clear instructions for users with different workflow preferences. The pipeline components are well-explained with concrete steps for each stage (initial formalization, compilation check, backtranslation, iterative refinement).


143-214: Theorem proving workflow is well-documented with clear configuration.

The section effectively explains the proving strategy and provides practical examples. The configuration parameters are explained through their role in the pipeline (e.g., chain-of-thought removal, wrong turn deletion, structured error feedback).


215-273: Model training section provides practical guidance.

Both CLI and Python variants are provided, making it accessible to users with different preferences. The training parameters are appropriate for fine-tuning on mathematical reasoning tasks.


274-385: Evaluation section is comprehensive with multiple scenarios.

The documentation covers standard evaluation, self-correction mode, and result summarization. This gives users flexibility in how they evaluate their models based on their specific needs and computational budgets.


386-390: Known limitations are transparently documented.

The section honestly acknowledges three key limitations with brief explanations. This transparency is appreciated and helps set user expectations appropriately.

| gpt-oss-20b | 43.03% | - |
| Qwen3-30B-A3B-Thinking | 16.80% | - |

## Dataset Fields
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably don't need this section, I think this can just stay in hf

These commands assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md).
Adjust paths and cluster settings according to your environment.

### Autoformalization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's going to be better, but consider splitting into multiple pages and linking here

generate(
ctx=wrap_arguments(
"++inference.tokens_to_generate=120000 "
"++prompt_config=lean4/deepseek-R1-autoformalization "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename these prompts? Why r1 if we use with gpt-oss, it's a bit confusing

++refinement=True \
++refinement_max_turns=8 \
++remove_cot=True \
++n_pass=1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be 4? Or mention how to get to pass@4 since that's what we are releasing

server_gpus=8,
server_args="--max-model-len 40960",
num_random_seeds=1,
dependent_jobs=2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

things like this we should typically not include as this is very cluster specific. Can just mention that it can take a while, so if users have timeouts, they can set this or num chunks

--expname=qwen3-8b-lean-sft \
--output_dir=/workspace/training/qwen3-8b-lean-sft \
--hf_model=Qwen/Qwen3-8B \
--training_data=/workspace/data/sft_data.jsonl \ # Processed output from theorem proving step
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ideally add instructions with a simple command for how to construct this data using the release HF dataset. Probably just a few lines to download / lightly postprocess. See other releases for examples

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally also add a step to prepare the output of theorem proving into this format, but it's less important than how to use existing dataset

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
…PR comments

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
@stephencge stephencge force-pushed the stepheng/nemotron-math-proofs-docs branch from e88a940 to e58c8c3 Compare December 16, 2025 00:25
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
@stephencge stephencge enabled auto-merge (squash) December 16, 2025 00:39
Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the current version and we can refine later if there are issues

@stephencge stephencge merged commit e393e48 into main Dec 16, 2025
5 checks passed
@stephencge stephencge deleted the stepheng/nemotron-math-proofs-docs branch December 16, 2025 02:10
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants