Results are not in the same order as input test cases #1034

joaopcm1996 · 2024-09-25T10:40:17Z

Evaluation results are out of order, which makes it a pain to use them in a downstream process, or if I'm evaluating 2 models predictions at once. The 2 model's predictions and scores get mixed up.

Steps to reproduce the behavior:

from deepeval import evaluate
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase

cases = []
for idx in range(3):
    cases.append(LLMTestCase(
        input=f"input{idx}",
        actual_output=f'The context says speaker {idx} is a bad person',
        retrieval_context=[f"context{idx}"])
    )

metric = FaithfulnessMetric(model='gpt-4o',include_reason=True)

results = evaluate(cases, [metric], print_results=False)

for idx,i in enumerate(results):
    print(i.input)
    print(i.actual_output)
    print(i.retrieval_context)
    print(cases[idx].retrieval_context)
    print(i.metrics_data[0].verbose_logs)
    print('\n\n')

Expected behavior
The results list should be ordered the same way as cases list.

The text was updated successfully, but these errors were encountered:

raresaxpo · 2024-11-15T09:46:36Z

I encountered the same issue. I am trying to match the metadata of the test cases in the evaluation dataset with the test results. Without the preservation of order, this is impossible..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results are not in the same order as input test cases #1034

Results are not in the same order as input test cases #1034

joaopcm1996 commented Sep 25, 2024 •

edited

Loading

raresaxpo commented Nov 15, 2024

Results are not in the same order as input test cases #1034

Results are not in the same order as input test cases #1034

Comments

joaopcm1996 commented Sep 25, 2024 • edited Loading

raresaxpo commented Nov 15, 2024

joaopcm1996 commented Sep 25, 2024 •

edited

Loading