You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluation results are out of order, which makes it a pain to use them in a downstream process, or if I'm evaluating 2 models predictions at once. The 2 model's predictions and scores get mixed up.
Steps to reproduce the behavior:
from deepeval import evaluate
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase
cases = []
for idx in range(3):
cases.append(LLMTestCase(
input=f"input{idx}",
actual_output=f'The context says speaker {idx} is a bad person',
retrieval_context=[f"context{idx}"])
)
metric = FaithfulnessMetric(model='gpt-4o',include_reason=True)
results = evaluate(cases, [metric], print_results=False)
for idx,i in enumerate(results):
print(i.input)
print(i.actual_output)
print(i.retrieval_context)
print(cases[idx].retrieval_context)
print(i.metrics_data[0].verbose_logs)
print('\n\n')
Expected behavior
The results list should be ordered the same way as cases list.
The text was updated successfully, but these errors were encountered:
I encountered the same issue. I am trying to match the metadata of the test cases in the evaluation dataset with the test results. Without the preservation of order, this is impossible..
Evaluation results are out of order, which makes it a pain to use them in a downstream process, or if I'm evaluating 2 models predictions at once. The 2 model's predictions and scores get mixed up.
Steps to reproduce the behavior:
Expected behavior
The results list should be ordered the same way as cases list.
The text was updated successfully, but these errors were encountered: