[FT] Support batch metric computation for SampleLevelMetrics

## Issue encountered
SampleLevelMetrics are always computed with batch size 1. This is really bad for more computationally expensive metrics involving LLM inference. Without batching these, it will take ages to evaluate. CorpusLevelMetrics are also not really a solution, because we want the metric on the sample level for statistics and for selecting samples for human evaluation afterwards. 

## Solution/Feature
In metrics.utils.__init__.py apply_generative_metric needs to support batches. We can still set the default to 1, but we should expose an argument metric_batch_size to the top of the evaluation. 

## Posssible alternatives
Currently, I don't see an alternative.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Support batch metric computation for SampleLevelMetrics #404

Issue encountered

Solution/Feature

Posssible alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Support batch metric computation for SampleLevelMetrics #404

Description

Issue encountered

Solution/Feature

Posssible alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions