Skip to content

Conversation

hynky1999
Copy link
Collaborator

@hynky1999 hynky1999 commented Feb 5, 2025

I have made several changes to math-verify to improve recall on correct answers, this should sync it with lighteval.
There are no more changes planned for it right now, so should be stable from now on

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hynky1999 hynky1999 requested a review from clefourrier February 5, 2025 09:29
Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good ! Unglodly amount of regex logic lmao but guess there is no way out ^^'

try:
add_to_specifics_with_timeout(formatted_doc, extracted_predictions, extracted_golds)
except: # noqa: E722
except Exception: # noqa: E722
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding exception without using it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some exceptions that don't inherit from Exception (e.g., KeyboardInterrupt). This shouldn't be caught, as we want the program to stop when the user sends it.

),
(r"$(3,1)$", r"${1,3}$", 1),
# Shouldn't work as it's ordered tuple vs set
# (r"$(3,1)$", r"${1,3}$", 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it commented ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version of math-verify didn't make a distinction between tuples and finite sets. The new version makes a distinction between them. Now it's very hard to know what was meant to be a set and what wasn't, so we use gold, which can be control the behavior.

Because the gold here is now an ordered tuple and pred is a set with incorrect ordering the result is false. If the gold was either set {3,1} or the pred was {1,3} (we assume the user meant tuple) it would be true.

],
)
def test_math_numina_cases(gold, pred, expected):
assert compare_strings(gold, pred, match_types=["latex", "expr"]) == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they all expect 1, would it be interesting to add cases where it should fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some with 0, to check that I didn't just do return True hhh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this was mostly to ensure that the new supported stuff works

@hynky1999 hynky1999 merged commit cb35bea into main Feb 5, 2025
4 checks passed
hynky1999 added a commit that referenced this pull request May 22, 2025
* update extraction match to reflect newest math-verify

* revert symbols, improve sets handling

* rm todo

* fmt + remove empty excepts + bump l2s

* fmt

* docstring
NathanHB pushed a commit that referenced this pull request Sep 19, 2025
* update extraction match to reflect newest math-verify

* revert symbols, improve sets handling

* rm todo

* fmt + remove empty excepts + bump l2s

* fmt

* docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants