Add maj@k metric #158

clefourrier · 2024-04-16T13:38:56Z

Still missing:

merge greedy sampling with the basic greedy evals, to avoid doing generations too many times - we can do sampled generation + normal generations in one step and split after on the num samples needed when we apply the metrics - will divide time taken by 2
update all the other launchers - inference endpoints, nanotron, etc - they don't cover it XD

…ples

src/lighteval/metrics/metrics.py

src/lighteval/tasks/tasks_table.jsonl

src/lighteval/metrics/metrics_sample.py

NathanHB

LGTM ! Do we have metric for gsm8k to compare with other implementations ?

clefourrier · 2024-04-22T06:43:59Z

I'll check it - however, please don't merge yet, I need to propagate the changes to the other lauchers + simplify greedy :)

…ls in one single step

NathanHB · 2024-04-22T09:40:38Z

Oh mb I thought it ready for review !

not ready

clefourrier · 2024-04-22T09:42:40Z

I hope I'll finish it today :)
It'll be a bit bigger because I'm simplifying part of the current system

src/lighteval/models/abstract_model.py

clefourrier · 2024-04-22T13:10:27Z

Tests failing because of the hub problems ^^"""

src/lighteval/metrics/__init__.py

src/lighteval/metrics/metrics_sample.py

Co-authored-by: Nathan Habib <[email protected]>

NathanHB · 2024-04-26T10:49:59Z

Is this good to be merged ? I saw on slack you did not get the same results for mistral models. (do we even know how they ran their tests ?)

clefourrier · 2024-04-29T07:58:32Z

I'll do more tests today - we could merge it now and adjust later if needed though.

NathanHB · 2024-04-30T10:05:58Z

Alright i'm merging it so that we can merge the other PRs

This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>

Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>

This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>

Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>

This PR needs #158 to be merged first. The main problem is that splits are no longer by size, but they are however now making sure that all batched generative evals are launched with similar evals (same sampling, same eos tokens) --------- Co-authored-by: Nathan Habib <[email protected]>

clefourrier added 3 commits April 16, 2024 13:36

init

b06e18e

wip

cfdc6ee

testing how to pad and gather with an added dimension for the num_sam…

98c1c12

…ples

lewtun reviewed Apr 18, 2024

View reviewed changes

src/lighteval/metrics/metrics.py Show resolved Hide resolved

lewtun reviewed Apr 18, 2024

View reviewed changes

src/lighteval/tasks/tasks_table.jsonl Show resolved Hide resolved

clefourrier added 3 commits April 19, 2024 12:42

now working, need to check why the metric is not displayed

1549fda

seems to be working!

8adfc07

add maj at 4 for math with preprocessing

24d4692

NathanHB reviewed Apr 20, 2024

View reviewed changes

src/lighteval/metrics/metrics_sample.py Show resolved Hide resolved

NathanHB previously approved these changes Apr 21, 2024

View reviewed changes

Uses a homogeneized system for all greedy evaluations - we can do eva…

3a1c1c4

…ls in one single step

clefourrier and others added 3 commits April 22, 2024 09:51

edit to prevent sampling for providing too many answers to some metrics

71aa2b8

added some doc

e36d6e0

Merge branch 'main' into add_maj_at_k

e1ea3e2

clefourrier commented Apr 22, 2024

View reviewed changes

src/lighteval/models/abstract_model.py Show resolved Hide resolved

neither nanotron nor endpoints models cover sampling atm

4ef4c99

clefourrier requested a review from NathanHB April 22, 2024 13:09

clefourrier force-pushed the add_maj_at_k branch from e45f395 to 4ef4c99 Compare April 22, 2024 13:39

add readme

26c7868

NathanHB reviewed Apr 22, 2024

View reviewed changes

src/lighteval/metrics/__init__.py Outdated Show resolved Hide resolved

NathanHB reviewed Apr 22, 2024

View reviewed changes

src/lighteval/metrics/__init__.py Show resolved Hide resolved

NathanHB reviewed Apr 22, 2024

View reviewed changes

src/lighteval/metrics/metrics_sample.py Outdated Show resolved Hide resolved

NathanHB approved these changes Apr 22, 2024

View reviewed changes

Merge branch 'main' into add_maj_at_k

e35a6f4

clefourrier mentioned this pull request Apr 23, 2024

Data split depending on eval params #169

Merged

clefourrier and others added 2 commits April 23, 2024 11:20

Update src/lighteval/metrics/metrics_sample.py

39cf676

Co-authored-by: Nathan Habib <[email protected]>

added review change

5a6988e

clefourrier mentioned this pull request Apr 23, 2024

Remove TGI models #167

Closed

Merge branch 'main' into add_maj_at_k

1431283

NathanHB merged commit 0a455c4 into main Apr 30, 2024

clefourrier mentioned this pull request May 3, 2024

Enable majority voting for GSM8k / MATH #62

Closed

hynky1999 pushed a commit that referenced this pull request May 22, 2025

Add maj@k metric (#158)

740c884

Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>

NathanHB added a commit that referenced this pull request Sep 19, 2025

Add maj@k metric (#158)

7216036

Co-authored-by: Nathan Habib <[email protected]> * added review change --------- Co-authored-by: Nathan Habib <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add maj@k metric #158

Add maj@k metric #158

Uh oh!

clefourrier commented Apr 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanHB left a comment

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

NathanHB commented Apr 22, 2024

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanHB commented Apr 26, 2024

Uh oh!

clefourrier commented Apr 29, 2024

Uh oh!

NathanHB commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add maj@k metric #158

Add maj@k metric #158

Uh oh!

Conversation

clefourrier commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

NathanHB commented Apr 22, 2024

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

Uh oh!

clefourrier commented Apr 22, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NathanHB commented Apr 26, 2024

Uh oh!

clefourrier commented Apr 29, 2024

Uh oh!

NathanHB commented Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clefourrier commented Apr 16, 2024 •

edited

Loading