Made litellm judge backend more robust. #485

JoelNiklaus · 2025-01-04T10:26:06Z

The metric computation fails when a result is None. Therefore retry and otherwise return an error ModelResponse.

clefourrier

LGTM

Side note: We should add an error flag in the ModelResponse if we want to use it to also report these kind of problems - otherwise, if we implement caching, we won't re-run these even though they are incorrect.
Can you also add a failed attribute to the ModelResponse, False by default, and True here?

You'll be able to use it in #480

src/lighteval/metrics/llm_as_judge.py

clefourrier

Thanks!

HuggingFaceDocBuilderDev · 2025-01-07T07:39:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This reverts commit fdb12f4.

* Made litellm judge backend more robust. * Added failed flag to ModelResponse. --------- Co-authored-by: Clémentine Fourrier <[email protected]>

Made litellm judge backend more robust.

e770039

clefourrier approved these changes Jan 6, 2025

View reviewed changes

src/lighteval/metrics/llm_as_judge.py Outdated Show resolved Hide resolved

clefourrier and others added 3 commits January 6, 2025 09:20

Merge branch 'main' into fix-litellm-judge

a00af73

Added failed flag to ModelResponse.

01803d3

Merge branch 'main' into fix-litellm-judge

a700d2b

clefourrier approved these changes Jan 7, 2025

View reviewed changes

clefourrier merged commit fdb12f4 into huggingface:main Jan 7, 2025
3 checks passed

JoelNiklaus added a commit to JoelNiklaus/lighteval that referenced this pull request Jan 7, 2025

Revert "Made litellm judge backend more robust. (huggingface#485)"

7b7001b

This reverts commit fdb12f4.

JoelNiklaus mentioned this pull request Jan 7, 2025

Hotfix for litellm judge #490

Merged

hynky1999 pushed a commit that referenced this pull request May 22, 2025

Made litellm judge backend more robust. (#485)

917487c

* Made litellm judge backend more robust. * Added failed flag to ModelResponse. --------- Co-authored-by: Clémentine Fourrier <[email protected]>

NathanHB pushed a commit that referenced this pull request Sep 19, 2025

Made litellm judge backend more robust. (#485)

ef1dd09

* Made litellm judge backend more robust. * Added failed flag to ModelResponse. --------- Co-authored-by: Clémentine Fourrier <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Made litellm judge backend more robust. #485

Made litellm judge backend more robust. #485

Uh oh!

JoelNiklaus commented Jan 4, 2025

Uh oh!

clefourrier left a comment

Uh oh!

Uh oh!

clefourrier left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Made litellm judge backend more robust. #485

Made litellm judge backend more robust. #485

Uh oh!

Conversation

JoelNiklaus commented Jan 4, 2025

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants