Skip to content

[FT] Support batch metric computation for SampleLevelMetrics #404

@JoelNiklaus

Description

@JoelNiklaus

Issue encountered

SampleLevelMetrics are always computed with batch size 1. This is really bad for more computationally expensive metrics involving LLM inference. Without batching these, it will take ages to evaluate. CorpusLevelMetrics are also not really a solution, because we want the metric on the sample level for statistics and for selecting samples for human evaluation afterwards.

Solution/Feature

In metrics.utils.init.py apply_generative_metric needs to support batches. We can still set the default to 1, but we should expose an argument metric_batch_size to the top of the evaluation.

Posssible alternatives

Currently, I don't see an alternative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions