Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented May 15, 2025

Bug fix for: huggingface/lighteval#721

I will merge once I've sanity checked the evals from a few Qwen / Llama models.

Here's the eval scores with this PR:

Model AIME24 GPQA-D MATH-500 LCB
DeepSeek-R1-Distill-Qwen-7B 51.3 52.4 93.4 37.2
DeepSeek-R1-Distill-Llama-8B 43.5 49.1 87.9 36.8

This is within the variance of the prior results.

@lewtun lewtun requested a review from edbeeching May 15, 2025 12:33
@lewtun lewtun merged commit ebd5913 into main May 16, 2025
1 check passed
@lewtun lewtun deleted the bump-lighteval-v0 branch May 16, 2025 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants