Conversation
📝 WalkthroughWalkthroughA new documentation page for the Nemotron-Math-Proofs dataset is added, detailing training and evaluation results, dataset fields, and reproducible workflows. The navigation configuration is updated to include this new page. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/releases/nemotronmathproofs/index.md (1)
380-384: Minor code-block style inconsistency.Line 382 uses a fenced code block (triple backticks) in a section that could be indented instead. While this is a minor linting issue from markdownlint-cli2 (MD046), consider standardizing the code-block style for consistency with other documentation.
Consider applying this diff to use indented code block style instead:
-To summarize evaluation results: - -```bash +To summarize evaluation results: + + ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm -ns summarize_results /workspace/evals/nemotron-nano-3-minif2f/eval-results/minif2f --cluster slurm -```Alternatively, if you prefer fenced blocks, you can suppress this specific MD046 rule in your markdownlint configuration if it conflicts with your documentation style guidelines.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/releases/nemotronmathproofs/index.md(1 hunks)mkdocs.yml(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/releases/nemotronmathproofs/index.md
382-382: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (10)
mkdocs.yml (1)
94-94: LGTM!The navigation entry follows the established pattern and correctly references the new documentation page.
docs/releases/nemotronmathproofs/index.md (9)
1-3: Metadata and structure look good.The YAML front matter is properly formatted with the PR creation date, and the document structure follows Markdown best practices.
5-25: Dataset overview is comprehensive and well-presented.The gpt-oss-120b model is OpenAI's open-weight model designed for powerful reasoning, making it an appropriate choice for autoformalization. Goedel-Prover-V2-32B achieves state-of-the-art performance on MiniF2F at Pass@32, validating the model choices presented here. The statistics and dataset composition are clearly documented.
27-42: Training results tables are clearly presented.The comparative results effectively demonstrate the performance improvements achieved by training on this dataset. The formatting with separate tables for the SFT results and Nemotron-Nano-v3 results aids readability.
44-56: Dataset fields table is well-documented.Each field is clearly described with its type and purpose, including notes about optional/nullable fields. This information will help users understand the dataset structure and guide their data loading pipelines.
58-142: Autoformalization workflow section is detailed and reproducible.The paired CLI and Python examples provide clear instructions for users with different workflow preferences. The pipeline components are well-explained with concrete steps for each stage (initial formalization, compilation check, backtranslation, iterative refinement).
143-214: Theorem proving workflow is well-documented with clear configuration.The section effectively explains the proving strategy and provides practical examples. The configuration parameters are explained through their role in the pipeline (e.g., chain-of-thought removal, wrong turn deletion, structured error feedback).
215-273: Model training section provides practical guidance.Both CLI and Python variants are provided, making it accessible to users with different preferences. The training parameters are appropriate for fine-tuning on mathematical reasoning tasks.
274-385: Evaluation section is comprehensive with multiple scenarios.The documentation covers standard evaluation, self-correction mode, and result summarization. This gives users flexibility in how they evaluate their models based on their specific needs and computational budgets.
386-390: Known limitations are transparently documented.The section honestly acknowledges three key limitations with brief explanations. This transparency is appreciated and helps set user expectations appropriately.
| | gpt-oss-20b | 43.03% | - | | ||
| | Qwen3-30B-A3B-Thinking | 16.80% | - | | ||
|
|
||
| ## Dataset Fields |
There was a problem hiding this comment.
probably don't need this section, I think this can just stay in hf
| These commands assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md). | ||
| Adjust paths and cluster settings according to your environment. | ||
|
|
||
| ### Autoformalization |
There was a problem hiding this comment.
not sure if it's going to be better, but consider splitting into multiple pages and linking here
| generate( | ||
| ctx=wrap_arguments( | ||
| "++inference.tokens_to_generate=120000 " | ||
| "++prompt_config=lean4/deepseek-R1-autoformalization " |
There was a problem hiding this comment.
should we rename these prompts? Why r1 if we use with gpt-oss, it's a bit confusing
| ++refinement=True \ | ||
| ++refinement_max_turns=8 \ | ||
| ++remove_cot=True \ | ||
| ++n_pass=1 \ |
There was a problem hiding this comment.
should this be 4? Or mention how to get to pass@4 since that's what we are releasing
| server_gpus=8, | ||
| server_args="--max-model-len 40960", | ||
| num_random_seeds=1, | ||
| dependent_jobs=2, |
There was a problem hiding this comment.
things like this we should typically not include as this is very cluster specific. Can just mention that it can take a while, so if users have timeouts, they can set this or num chunks
| --expname=qwen3-8b-lean-sft \ | ||
| --output_dir=/workspace/training/qwen3-8b-lean-sft \ | ||
| --hf_model=Qwen/Qwen3-8B \ | ||
| --training_data=/workspace/data/sft_data.jsonl \ # Processed output from theorem proving step |
There was a problem hiding this comment.
We should ideally add instructions with a simple command for how to construct this data using the release HF dataset. Probably just a few lines to download / lightly postprocess. See other releases for examples
There was a problem hiding this comment.
ideally also add a step to prepare the output of theorem proving into this format, but it's less important than how to use existing dataset
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
…PR comments Signed-off-by: Stephen Ge <stepheng@nvidia.com>
e88a940 to
e58c8c3
Compare
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Kipok
left a comment
There was a problem hiding this comment.
Let's merge the current version and we can refine later if there are issues
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.