FIX math verify handle leading zeros and int literals cases by gwarmstrong · Pull Request #1074 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-12-04T21:43:09Z

Problem

The math grader incorrectly marked answers with leading zeros as wrong:

predicted_answer="016" vs expected_answer="16" → False (should be True)

This caused AIME problem #3 (id: aime25-2) to be incorrectly scored.

Root Cause

The literal comparison logic short-circuited before reaching math_verify's symbolic comparison, which already handles leading zeros correctly.

Solution

Delegate numeric comparisons to math_verify (which already handles leading zeros correctly) instead of short-circuiting with string comparison.

Summary by CodeRabbit

Bug Fixes
- Improved mathematical answer verification with refined normalization and comparison logic.
- Fixed handling of numeric answers containing leading zeros in evaluations.
Tests
- Added test coverage for leading-zero number scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2025-12-04T21:44:52Z

📝 Walkthrough

Walkthrough

Refactored the math_equal function to improve comparison logic by introducing an early normalization check, adding explicit text-literal detection, and replacing the fallback path with symbolic comparison via math_verify. Added test cases for leading-zero handling. Function parameter renamed from **kw to **kwargs.

Changes

Cohort / File(s)	Summary
Math grader logic refactoring `nemo_skills/evaluation/math_grader.py`	Restructured `math_equal` comparison flow: removed literal-pattern quick path; added early exit for normalized equality; introduced dedicated text-literal handling; replaced fallback with symbolic comparison via `math_verify`. Parameter `kw` renamed to `kwargs`. MCQ and modulo logic paths retained.
Test coverage `tests/test_math_equal.py`	Added two parameterized test cases to `test_correct_examples`: `("016", "16")` and `("007", "7")` to validate leading-zero normalization behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Focus areas:
- Verify the early normalization path correctly handles all edge cases previously covered by the literal-pattern quick path
- Confirm text-literal detection logic is distinct from and doesn't overlap with the normalized equality check
- Review math_verify invocation to ensure LaTeX wrapping and normalization steps are applied consistently
- Validate that MCQ and modulo handling paths remain functionally equivalent with adjusted control flow

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main fix: handling leading zeros and integer literals in math_verify, which aligns with the core change of removing string-comparison shortcuts and delegating to math_verify.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/fix-math-verify-for-leading-zeros

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_skills/evaluation/math_grader.py (1)
72-85: Clarify text‑literal branch behavior (comment vs. implementation)

The overall control flow here looks good:

Fast path on is_normalized_equal avoids unnecessary symbolic work.

The text_literal_pattern guard keeps plain text out of math_verify, which is what we want.

Minor clarity nit: the comment says “For TEXT literals (not numeric), use direct string comparison”, but the branch actually just returns False, relying on the earlier is_normalized_equal check to have already handled the equal‑string case.

To better align intent and implementation, consider one of:

Adjusting the comment to something like:
“For TEXT literals (not numeric), if they weren’t already equal after normalization, short‑circuit to False instead of calling math_verify.”

Or, make the behavior explicit in code, e.g.:
-    if is_text_literal:
-        return False  # Already checked is_normalized_equal above
+    if is_text_literal:
+        # For pure text literals, equality is decided by normalized string match above.
+        # If we reach here, they differ, so avoid symbolic parsing.
+        return False
Functionally identical, but it makes the intended semantics obvious to the next reader.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9115aef and 44a7fe8.

📒 Files selected for processing (2)

nemo_skills/evaluation/math_grader.py (1 hunks)
tests/test_math_equal.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (1)

tests/test_math_equal.py (1)

37-38: Nice addition of regression tests for leading‑zero answers

The new cases for "016" == "16" and "007" == "7" directly capture the original bug and are symmetric (both argument orders are exercised), which should keep this behavior from regressing.

wedu-nvidia

Thanks

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

FIX math verify handle leading zeros and int literals cases

ac05c1d

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong requested a review from i-vainn December 4, 2025 21:43

Merge branch 'main' into georgea/fix-math-verify-for-leading-zeros

44a7fe8

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

gwarmstrong requested review from stephencge and wedu-nvidia December 4, 2025 22:19

wedu-nvidia approved these changes Dec 4, 2025

View reviewed changes

wedu-nvidia merged commit 5cc1dcc into main Dec 4, 2025
5 checks passed

wedu-nvidia deleted the georgea/fix-math-verify-for-leading-zeros branch December 4, 2025 22:22

gwarmstrong mentioned this pull request Dec 6, 2025

DOC update nemo-skills in docs sgl-project/sglang#14555

Merged

6 tasks

Jorjeous pushed a commit that referenced this pull request Dec 11, 2025

FIX math verify handle leading zeros and int literals cases (#1074)

7926637

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Dec 12, 2025

FIX math verify handle leading zeros and int literals cases (#1074)

73bf1bb

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

FIX math verify handle leading zeros and int literals cases (#1074)

782b083

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

FIX math verify handle leading zeros and int literals cases (#1074)

00c0b73

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

FIX math verify handle leading zeros and int literals cases (#1074)

cc8915c

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX math verify handle leading zeros and int literals cases#1074

FIX math verify handle leading zeros and int literals cases#1074
wedu-nvidia merged 2 commits intomainfrom
georgea/fix-math-verify-for-leading-zeros

gwarmstrong commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

wedu-nvidia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gwarmstrong commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

wedu-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gwarmstrong commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading