Skip to content

Conversation

clefourrier
Copy link
Member

@clefourrier clefourrier commented Jul 17, 2024

This will not include few shot samples, as they require a model and its tokenizer to be built with the correct truncation.

Fix #164 and a request by @lewtun .

@clefourrier
Copy link
Member Author

Example for command: lighteval tasks --inspect "lighteval|wmt14:fr-en|0|0" --num_samples 1 --show_config

}
lighteval/sacrebleu_manual wmt14_fr-en
Careful, the task lighteval|wmt14:fr-en is using evaluation data to build the few shot examples.
---------- lighteval|wmt14:fr-en ----------
---------- CONFIG
|           Key            |               Value                |
|--------------------------|------------------------------------|
|name                      |wmt14:fr-en                         |
|prompt_function           |wmt_reverse_alphabetical            |
|hf_repo                   |lighteval/sacrebleu_manual          |
|hf_subset                 |wmt14_fr-en                         |
|metric 0: metric_name     |bleu                                |
|metric 0: higher_is_better|True                                |
|metric 0: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 0: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 0: sample_level_fn |GenerativePreparator.prepare        |
|metric 0: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 1: metric_name     |chrf                                |
|metric 1: higher_is_better|True                                |
|metric 1: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 1: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 1: sample_level_fn |GenerativePreparator.prepare        |
|metric 1: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 2: metric_name     |ter                                 |
|metric 2: higher_is_better|False                               |
|metric 2: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 2: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 2: sample_level_fn |GenerativePreparator.prepare        |
|metric 2: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|hf_avail_splits           |('test',)                           |
|evaluation_splits         |('test',)                           |
|few_shots_split           |None                                |
|few_shots_select          |None                                |
|generation_size           |None                                |
|stop_sequence             |('\n',)                             |
|output_regex              |None                                |
|num_samples               |None                                |
|frozen                    |False                               |
|suite                     |('lighteval', 'sacrebleu')          |
|original_num_docs         |-1                                  |
|effective_num_docs        |-1                                  |
|trust_dataset             |True                                |
|must_remove_duplicate_docs|None                                |
|version                   |0                                   |

---------- SAMPLES
{"query": "French phrase: L'affaire NSA souligne l'absence totale de d\u00e9bat sur le renseignement\nEnglish phrase:", "choices": ["NSA Affair Emphasizes Complete Lack of Debate on Intelligence"], "gold_index": 0, "original_query": "", "specific": null, "task_name": "lighteval|wmt14:fr-en", "instruction": null, "target_for_fewshot_sorting": null, "ctx": "", "num_asked_few_shots": -1, "num_effective_few_shots": -1}

@NathanHB
Copy link
Member

Great feature ! small nit however, I would print the samples in a formated way so that's it's easier to read using pprint.

@clefourrier
Copy link
Member Author

Good idea!

@clefourrier clefourrier requested a review from NathanHB July 25, 2024 17:14
@clefourrier clefourrier merged commit cbae17d into main Aug 1, 2024
hynky1999 pushed a commit that referenced this pull request May 22, 2025
* add examples of samples (does not include few shot), and add robustness to logger

* added cache at all levels

* added nice printing for the config and more options for task display

* added pprint

---------

Co-authored-by: Nathan Habib <[email protected]>
NathanHB added a commit that referenced this pull request Sep 19, 2025
* add examples of samples (does not include few shot), and add robustness to logger

* added cache at all levels

* added nice printing for the config and more options for task display

* added pprint

---------

Co-authored-by: Nathan Habib <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose a few model predictions / gold answers in the logs
2 participants