adds aime24, 25 and math500 #586

NathanHB · 2025-02-25T11:09:18Z

No description provided.

HuggingFaceDocBuilderDev · 2025-02-25T11:11:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

examples/model_configs/vllm_model_config.yaml

clefourrier · 2025-02-25T11:17:17Z

src/lighteval/models/vllm/vllm_model.py

                stop_tokens = dataset[0].stop_sequence

-            max_new_tokens = dataset[0].generation_size  # could be none
+            max_new_tokens = (


By default, we would want the dataset to set the max length (or to cap it at max_tokens if asked by the user)

It is set by the task by default is is overiden if the user sets it

We don't want to possibly override to above the generation size imo

I think it's good to have sensible defaults in the task definition (e.g. 4096 tokens is good for most models), but give users the ability to override at runtime, e.g. if they want to evaluate a long CoT model that requires 32k tokens

Issue is going to be task repro

I'm OK with leaving it if a warning message is displayed

I'm not sure why task repro would be an issue here. all is clearly defined wither in task config or model config, some model (thinking) require something like 8k context length to generate what shoul be a much smaller response

Well, you know how much people share their configs in eval ^^
If people assume they can reproduce the results of a model on a task, but actually the model config used overwrote the generation length of the task to allow the model to generate more, it will lead to a mismatch.

I would personally not expect my model config to overwrite the maximum allowed length for a given task

src/lighteval/tasks/default_prompts.py

examples/model_configs/vllm_model_config.yaml

lewtun

Thanks a lot for porting these over - LGTM!

src/lighteval/tasks/default_prompts.py

…al into nathan-add-aime24-25

* commit * Apply suggestions from code review * commit * add prompt to math 500 * add prompt to math 500

commit

6011725

NathanHB requested a review from lewtun February 25, 2025 11:13

clefourrier reviewed Feb 25, 2025

View reviewed changes

NathanHB commented Feb 25, 2025

View reviewed changes

examples/model_configs/vllm_model_config.yaml Outdated Show resolved Hide resolved

NathanHB commented Feb 25, 2025

View reviewed changes

examples/model_configs/vllm_model_config.yaml Outdated Show resolved Hide resolved

Apply suggestions from code review

0e1e45a

lewtun approved these changes Feb 25, 2025

View reviewed changes

src/lighteval/tasks/default_prompts.py Show resolved Hide resolved

NathanHB added 4 commits February 25, 2025 13:15

commit

8d229c9

Merge branch 'nathan-add-aime24-25' of github.com:huggingface/lightev…

f0f9ca4

…al into nathan-add-aime24-25

add prompt to math 500

5f0a9d7

add prompt to math 500

d4a5966

NathanHB merged commit 4c9af85 into main Feb 25, 2025
4 checks passed

hynky1999 pushed a commit that referenced this pull request May 22, 2025

adds aime24, 25 and math500 (#586)

25ded15

* commit * Apply suggestions from code review * commit * add prompt to math 500 * add prompt to math 500

NathanHB added a commit that referenced this pull request Sep 19, 2025

adds aime24, 25 and math500 (#586)

c870dee

* commit * Apply suggestions from code review * commit * add prompt to math 500 * add prompt to math 500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adds aime24, 25 and math500 #586

adds aime24, 25 and math500 #586

Uh oh!

NathanHB commented Feb 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

NathanHB Feb 25, 2025

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

lewtun Feb 25, 2025

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

NathanHB Feb 25, 2025

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

clefourrier Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adds aime24, 25 and math500 #586

adds aime24, 25 and math500 #586

Uh oh!

Conversation

NathanHB commented Feb 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!