Bump LightEval #643

lewtun · 2025-05-15T12:33:34Z

I will merge once I've sanity checked the evals from a few Qwen / Llama models.

Here's the eval scores with this PR:

Model	AIME24	GPQA-D	MATH-500	LCB
DeepSeek-R1-Distill-Qwen-7B	51.3	52.4	93.4	37.2
DeepSeek-R1-Distill-Llama-8B	43.5	49.1	87.9	36.8

This is within the variance of the prior results.

Bump LightEval

2adcd5d

lewtun requested a review from edbeeching May 15, 2025 12:33

edbeeching approved these changes May 15, 2025

View reviewed changes

lewtun merged commit ebd5913 into main May 16, 2025
1 check passed

lewtun deleted the bump-lighteval-v0 branch May 16, 2025 08:52

Provide feedback