Expose samples via the CLI #228

clefourrier · 2024-07-17T09:19:00Z

This will not include few shot samples, as they require a model and its tokenizer to be built with the correct truncation.

Fix #164 and a request by @lewtun .

…ss to logger

clefourrier · 2024-07-17T13:24:20Z

Example for command: lighteval tasks --inspect "lighteval|wmt14:fr-en|0|0" --num_samples 1 --show_config

}
lighteval/sacrebleu_manual wmt14_fr-en
Careful, the task lighteval|wmt14:fr-en is using evaluation data to build the few shot examples.
---------- lighteval|wmt14:fr-en ----------
---------- CONFIG
|           Key            |               Value                |
|--------------------------|------------------------------------|
|name                      |wmt14:fr-en                         |
|prompt_function           |wmt_reverse_alphabetical            |
|hf_repo                   |lighteval/sacrebleu_manual          |
|hf_subset                 |wmt14_fr-en                         |
|metric 0: metric_name     |bleu                                |
|metric 0: higher_is_better|True                                |
|metric 0: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 0: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 0: sample_level_fn |GenerativePreparator.prepare        |
|metric 0: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 1: metric_name     |chrf                                |
|metric 1: higher_is_better|True                                |
|metric 1: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 1: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 1: sample_level_fn |GenerativePreparator.prepare        |
|metric 1: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|metric 2: metric_name     |ter                                 |
|metric 2: higher_is_better|False                               |
|metric 2: category        |<MetricCategory.GENERATIVE: '3'>    |
|metric 2: use_case        |<MetricUseCase.TRANSLATION: '9'>    |
|metric 2: sample_level_fn |GenerativePreparator.prepare        |
|metric 2: corpus_level_fn |CorpusLevelTranslationMetric.compute|
|hf_avail_splits           |('test',)                           |
|evaluation_splits         |('test',)                           |
|few_shots_split           |None                                |
|few_shots_select          |None                                |
|generation_size           |None                                |
|stop_sequence             |('\n',)                             |
|output_regex              |None                                |
|num_samples               |None                                |
|frozen                    |False                               |
|suite                     |('lighteval', 'sacrebleu')          |
|original_num_docs         |-1                                  |
|effective_num_docs        |-1                                  |
|trust_dataset             |True                                |
|must_remove_duplicate_docs|None                                |
|version                   |0                                   |

---------- SAMPLES
{"query": "French phrase: L'affaire NSA souligne l'absence totale de d\u00e9bat sur le renseignement\nEnglish phrase:", "choices": ["NSA Affair Emphasizes Complete Lack of Debate on Intelligence"], "gold_index": 0, "original_query": "", "specific": null, "task_name": "lighteval|wmt14:fr-en", "instruction": null, "target_for_fewshot_sorting": null, "ctx": "", "num_asked_few_shots": -1, "num_effective_few_shots": -1}

NathanHB · 2024-07-23T10:25:40Z

Great feature ! small nit however, I would print the samples in a formated way so that's it's easier to read using pprint.

clefourrier · 2024-07-23T15:59:40Z

Good idea!

* add examples of samples (does not include few shot), and add robustness to logger * added cache at all levels * added nice printing for the config and more options for task display * added pprint --------- Co-authored-by: Nathan Habib <[email protected]>

clefourrier added 3 commits July 17, 2024 09:18

add examples of samples (does not include few shot), and add robustne…

2a6da98

…ss to logger

added cache at all levels

3a28895

added nice printing for the config and more options for task display

f892248

clefourrier mentioned this pull request Jul 17, 2024

Extend #228 with few shot samples #230

Open

Merge branch 'main' into clem_details

8038c8c

clefourrier mentioned this pull request Jul 18, 2024

Add CLI to evaluate models, list supported tasks etc #53

Closed

Merge branch 'main' into clem_details

65fef95

NathanHB and others added 2 commits July 24, 2024 14:43

Merge branch 'main' into clem_details

e816cdc

added pprint

5dc3273

clefourrier requested a review from NathanHB July 25, 2024 17:14

Merge branch 'main' into clem_details

cbb57db

NathanHB approved these changes Jul 31, 2024

View reviewed changes

Merge branch 'main' into clem_details

746208c

clefourrier merged commit cbae17d into main Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose samples via the CLI #228

Expose samples via the CLI #228

Uh oh!

clefourrier commented Jul 17, 2024 •

edited

Loading

Uh oh!

clefourrier commented Jul 17, 2024

Uh oh!

NathanHB commented Jul 23, 2024

Uh oh!

clefourrier commented Jul 23, 2024

Uh oh!

Uh oh!

Expose samples via the CLI #228

Expose samples via the CLI #228

Uh oh!

Conversation

clefourrier commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clefourrier commented Jul 17, 2024

Uh oh!

NathanHB commented Jul 23, 2024

Uh oh!

clefourrier commented Jul 23, 2024

Uh oh!

Uh oh!

clefourrier commented Jul 17, 2024 •

edited

Loading