You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* init - now gives the path with an arg, maybe will remove
* allows several custom task modules to be loaded
* fix quality
---------
Co-authored-by: Nathan Habib <[email protected]>
Co-authored-by: Nathan Habib <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+10-3Lines changed: 10 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -210,9 +210,13 @@ However, we are very grateful to the Harness and HELM teams for their continued
210
210
If your new task or metric has requirements, add a specific `requirements.txt` file with your evaluation.
211
211
212
212
### Adding a new task
213
-
To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, or in the community tasks, and **add its dataset** on the hub.
214
-
Note: Core evaluations are evals we will add to our test suite to ensure non regression through time, and which already see a high usage in the community.
215
-
A popular community evaluation can move to become a core evaluation through time.
213
+
To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, in the extended tasks, or in the community tasks, and **add its dataset** on the hub.
214
+
215
+
- Core evaluations are evaluation which only require standard logic in their metrics and processing, and that we will add to our test suite to ensure non regression through time. They already see a high usage in the community.
216
+
- Extended evaluations are evaluations which require custom logic in their metrics (complex normalisation, an LLM as a judge, ...), that we added to facilitate the life of users. They already see a high usage in the community.
217
+
- Community evaluations are submissions by the community of new tasks.
218
+
219
+
A popular community evaluation can move to becoming an extended or core evaluation through time.
216
220
217
221
#### Core evaluations
218
222
Prompt function: **find a suitable prompt function** in `src.lighteval.tasks.task_prompt_formatting.py`, or code your own. This function must output a `Doc` object, which should contain `query`, your prompt, and either `gold`, the gold output, or `choices` and `gold_index`, the list of choices and index or indices of correct answers. If your query contains an instruction which should not be repeated in a few shot setup, add it to an `instruction` field.
@@ -241,6 +245,9 @@ Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/t
241
245
242
246
Make sure you can launch your model with your new task using `--tasks lighteval|yournewtask|2|0`.
243
247
248
+
### Extended evaluations
249
+
Proceed as for community evaluations, but in the `extended_tasks` folder.
250
+
244
251
#### Community evaluations
245
252
Copy the `community_tasks/_template.yml` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.
0 commit comments