Skip to content

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented May 23, 2025

Results for HuggingFaceTB/SmolLM2-1.7B-Instruct

uv run lighteval vllm "model_name=HuggingFaceTB/SmolLM2-1.7B-Instruct"  "lighteval|gsm_plus|0|0"   --use-chat-template
Task Version Metric Value Stderr
all extractive_match 0.213 ± 0.0043
lighteval:gsm_plus:0 0 extractive_match 0.213 ± 0.0043

@NathanHB NathanHB linked an issue May 23, 2025 that may be closed by this pull request
@NathanHB NathanHB self-assigned this May 23, 2025
@NathanHB NathanHB requested review from Copilot and lewtun May 23, 2025 10:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for a new GSM-Plus task by registering its configuration and prompt handler.

  • Introduce a gsm_plus task in default_tasks.py
  • Implement gsm_plus prompt logic in default_prompts.py

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/lighteval/tasks/default_tasks.py Register new gsm_plus task configuration
src/lighteval/tasks/default_prompts.py Add prompt function for filtering and formatting
Comments suppressed due to low confidence (1)

src/lighteval/tasks/default_tasks.py:7963

  • Add tests to cover the new gsm_plus task configuration (e.g., prompt generation and evaluation flow) to ensure it behaves as expected.
gsm_plus = LightevalTaskConfig(

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice eval! Before merging, could you run 1-2 models from their table to see if we get similar results?

Screenshot 2025-05-23 at 12 12 43

@NathanHB NathanHB merged commit 9619194 into main May 28, 2025
5 checks passed
NathanHB added a commit that referenced this pull request Sep 19, 2025
* commit

* Update src/lighteval/tasks/default_prompts.py

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EVAL] GSM Plus

3 participants