Skip to content

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented Sep 30, 2024

What this PR does:

  • Adds MixEval task
    • add a mixeval judge as a sample metric using the new LLMJudge metric
  • refactor the judge metric
    • easier to define judges for custom tasks
  • now batches the model restuls per tasks and then per metric type to be computed in batch (does not change anything for tasks other than llm as judge which is now much faster)

@NathanHB NathanHB requested review from clefourrier and lewtun October 3, 2024 14:27
Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but you need some changes - super nice overalll refacto!

@NathanHB NathanHB merged commit 0483bef into main Oct 7, 2024
2 checks passed
@NathanHB NathanHB mentioned this pull request Oct 10, 2024
hynky1999 pushed a commit that referenced this pull request May 22, 2025
What this PR does:
- [x] Adds MixEval task
   - add a mixeval judge as a sample metric using the new LLMJudge metric
- [x] refactor the judge metric
    - easier to define judges for custom tasks
- [x] now batches the model restuls per tasks and then per metric type to be computed in batch (does not change anything for tasks other than llm as judge which is now much faster)

---------

Co-authored-by: Clémentine Fourrier <[email protected]>
NathanHB added a commit that referenced this pull request Sep 19, 2025
What this PR does:
- [x] Adds MixEval task
   - add a mixeval judge as a sample metric using the new LLMJudge metric
- [x] refactor the judge metric
    - easier to define judges for custom tasks
- [x] now batches the model restuls per tasks and then per metric type to be computed in batch (does not change anything for tasks other than llm as judge which is now much faster)

---------

Co-authored-by: Clémentine Fourrier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants