Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results are not in the same order as input test cases #1034

Open
joaopcm1996 opened this issue Sep 25, 2024 · 1 comment
Open

Results are not in the same order as input test cases #1034

joaopcm1996 opened this issue Sep 25, 2024 · 1 comment

Comments

@joaopcm1996
Copy link

joaopcm1996 commented Sep 25, 2024

Evaluation results are out of order, which makes it a pain to use them in a downstream process, or if I'm evaluating 2 models predictions at once. The 2 model's predictions and scores get mixed up.

Steps to reproduce the behavior:

from deepeval import evaluate
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase

cases = []
for idx in range(3):
    cases.append(LLMTestCase(
        input=f"input{idx}",
        actual_output=f'The context says speaker {idx} is a bad person',
        retrieval_context=[f"context{idx}"])
    )

metric = FaithfulnessMetric(model='gpt-4o',include_reason=True)

results = evaluate(cases, [metric], print_results=False)

for idx,i in enumerate(results):
    print(i.input)
    print(i.actual_output)
    print(i.retrieval_context)
    print(cases[idx].retrieval_context)
    print(i.metrics_data[0].verbose_logs)
    print('\n\n')

Expected behavior
The results list should be ordered the same way as cases list.

@raresaxpo
Copy link

I encountered the same issue. I am trying to match the metadata of the test cases in the evaluation dataset with the test results. Without the preservation of order, this is impossible..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants