Skip to content

Expose a few model predictions / gold answers in the logs #164

@lewtun

Description

@lewtun

For generative benchmarks like MATH / GSM8k / IFEval, it would be great to have some visibility in the logs on how the prompts are formatted, what the generations look like, what the gold answer is etc.

Currently, the best approach I've found is to first run the benchmark with --max_samples and then manually inspect the details Parquet file. However this is rather cumbersome, especially when launching many evals in parallel :)

Perhaps we can store the first N examples in the logs?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions