Skip to content

Commit 4550cb7

Browse files
NathanHBNathan Habibclefourrier
authored
launch lighteval using lighteval --args (#152)
--------- Co-authored-by: Nathan Habib <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]>
1 parent aaf7e8a commit 4550cb7

File tree

11 files changed

+234
-314
lines changed

11 files changed

+234
-314
lines changed

README.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ pre-commit install
7878

7979
We provide two main entry points to evaluate models:
8080

81-
* `run_evals_accelerate.py`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
82-
* `run_evals_nanotron.py`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).
81+
* `lighteval accelerate`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
82+
* `lighteval nanotron`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).
8383

8484
For most users, we recommend using the 🤗 Accelerate backend - see below for specific commands.
8585

@@ -94,7 +94,8 @@ accelerate config
9494
You can then evaluate a model using data parallelism as follows:
9595

9696
```shell
97-
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
97+
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
98+
lighteval accelerate \
9899
--model_args="pretrained=<path to model on the hub>" \
99100
--tasks <task parameters> \
100101
--output_dir output_dir
@@ -109,7 +110,8 @@ suite|task|num_few_shot|{0 or 1 to automatically reduce `num_few_shot` if prompt
109110
or a file path like [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.txt) which specifies multiple task configurations. For example, to evaluate GPT-2 on the Truthful QA benchmark run:
110111

111112
```shell
112-
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
113+
accelerate launch --multi_gpu --num_processes=8 -m \
114+
lighteval accelerate \
113115
--model_args "pretrained=gpt2" \
114116
--tasks "lighteval|truthfulqa:mc|0|0" \
115117
--override_batch_size 1 \
@@ -119,7 +121,8 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
119121
Here, `--override_batch_size` defines the _batch size per device_, so the effective batch size will be `override_batch_size x num_gpus`. To evaluate on multiple benchmarks, separate each task configuration with a comma, e.g.
120122

121123
```shell
122-
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
124+
accelerate launch --multi_gpu --num_processes=8 -m \
125+
lighteval accelerate \
123126
--model_args "pretrained=gpt2" \
124127
--tasks "leaderboard|truthfulqa:mc|0|0,leaderboard|gsm8k|0|0" \
125128
--override_batch_size 1 \
@@ -133,7 +136,8 @@ See the [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.
133136
If you want to evaluate a model by spinning up inference endpoints, use adapter/delta weights, or more complex configuration options, you can load models using a configuration file. This is done as follows:
134137

135138
```shell
136-
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
139+
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
140+
lighteval accelerate \
137141
--model_config_path="<path to your model configuration>" \
138142
--tasks <task parameters> \
139143
--output_dir output_dir
@@ -147,13 +151,15 @@ To evaluate models larger that ~40B parameters in 16-bit precision, you will nee
147151

148152
```shell
149153
# PP=2, DP=4 - good for models < 70B params
150-
accelerate launch --multi_gpu --num_processes=4 run_evals_accelerate.py \
154+
accelerate launch --multi_gpu --num_processes=4 -m \
155+
lighteval accelerate \
151156
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
152157
--tasks <task parameters> \
153158
--output_dir output_dir
154159

155160
# PP=4, DP=2 - good for huge models >= 70B params
156-
accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
161+
accelerate launch --multi_gpu --num_processes=2 -m \
162+
lighteval accelerate \
157163
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
158164
--tasks <task parameters> \
159165
--output_dir output_dir
@@ -164,7 +170,8 @@ accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
164170
To evaluate a model on all the benchmarks of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) using a single node of 8 GPUs, run:
165171

166172
```shell
167-
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
173+
accelerate launch --multi_gpu --num_processes=8 -m \
174+
lighteval accelerate \
168175
--model_args "pretrained=<model name>" \
169176
--tasks examples/tasks/open_llm_leaderboard_tasks.txt \
170177
--override_batch_size 1 \
@@ -176,7 +183,7 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
176183
You can also use `lighteval` to evaluate models on CPU, although note this will typically be very slow for large models. To do so, run:
177184

178185
```shell
179-
python run_evals_accelerate.py \
186+
lighteval accelerate \
180187
--model_args="pretrained=<path to model on the hub>"\
181188
--tasks <task parameters> \
182189
--output_dir output_dir
@@ -211,7 +218,7 @@ Independently of the default tasks provided in `lighteval` that you will find in
211218

212219
For example, to run an extended task like `ifeval`, you can run:
213220
```shell
214-
python run_evals_accelerate.py \
221+
lighteval accelerate \
215222
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
216223
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
217224
--tasks "extended|ifeval|0|0" \
@@ -221,7 +228,7 @@ python run_evals_accelerate.py \
221228
To run a community or custom task, you can use (note the custom_tasks flag):
222229

223230
```shell
224-
python run_evals_accelerate.py \
231+
lighteval accelerate \
225232
--model_args="pretrained=<path to model on the hub>"\
226233
--tasks <task parameters> \
227234
--custom_tasks <path to your custom or community task> \
@@ -231,7 +238,7 @@ python run_evals_accelerate.py \
231238
For example, to launch `lighteval` on `arabic_mmlu:abstract_algebra` for `HuggingFaceH4/zephyr-7b-beta`, run:
232239

233240
```shell
234-
python run_evals_accelerate.py \
241+
lighteval accelerate \
235242
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
236243
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
237244
--tasks "community|arabic_mmlu:abstract_algebra|5|1" \
@@ -464,7 +471,7 @@ source <path_to_your_venv>/activate #or conda activate yourenv
464471
cd <path_to_your_lighteval>/lighteval
465472

466473
export CUDA_LAUNCH_BLOCKING=1
467-
srun accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
474+
srun accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
468475
```
469476

470477
## Releases

examples/model_configs/endpoint_model.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ model:
1616
endpoint_type: "protected"
1717
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace
1818
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
19-
env_vars:
19+
env_vars:
2020
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
2121
generation:
2222
add_special_tokens: true

examples/model_configs/tgi_model.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ model:
33
instance:
44
inference_server_address: ""
55
inference_server_auth: null
6-
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
6+
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,4 +102,4 @@ Issues = "https://github.com/huggingface/lighteval/issues"
102102
# Changelog = "https://github.com/huggingface/lighteval/blob/master/CHANGELOG.md"
103103

104104
[project.scripts]
105-
lighteval = "lighteval.commands.lighteval_cli:main"
105+
lighteval = "lighteval.__main__:cli_evaluate"

run_evals_accelerate.py

Lines changed: 0 additions & 89 deletions
This file was deleted.

run_evals_nanotron.py

Lines changed: 0 additions & 55 deletions
This file was deleted.

src/lighteval/__main__.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
#!/usr/bin/env python
2+
3+
# MIT License
4+
5+
# Copyright (c) 2024 Taratra D. RAHARISON and The HuggingFace Team
6+
7+
# Permission is hereby granted, free of charge, to any person obtaining a copy
8+
# of this software and associated documentation files (the "Software"), to deal
9+
# in the Software without restriction, including without limitation the rights
10+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11+
# copies of the Software, and to permit persons to whom the Software is
12+
# furnished to do so, subject to the following conditions:
13+
14+
# The above copyright notice and this permission notice shall be included in all
15+
# copies or substantial portions of the Software.
16+
17+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23+
# SOFTWARE.
24+
25+
import argparse
26+
27+
from lighteval.parsers import parser_accelerate, parser_nanotron
28+
from lighteval.tasks.registry import Registry
29+
30+
31+
def cli_evaluate():
32+
parser = argparse.ArgumentParser(description="CLI tool for lighteval, a lightweight framework for LLM evaluation")
33+
subparsers = parser.add_subparsers(help="help for subcommand", dest="subcommand")
34+
35+
# create the parser for the "accelerate" command
36+
parser_a = subparsers.add_parser("accelerate", help="use accelerate and transformers as backend for evaluation.")
37+
parser_accelerate(parser_a)
38+
39+
# create the parser for the "nanotron" command
40+
parser_b = subparsers.add_parser("nanotron", help="use nanotron as backend for evaluation.")
41+
parser_nanotron(parser_b)
42+
43+
parser.add_argument("--list-tasks", action="store_true", help="List available tasks")
44+
45+
args = parser.parse_args()
46+
47+
if args.subcommand == "accelerate":
48+
from lighteval.main_accelerate import main as main_accelerate
49+
50+
main_accelerate(args)
51+
return
52+
53+
if args.subcommand == "nanotron":
54+
from lighteval.main_nanotron import main as main_nanotron
55+
56+
main_nanotron(args.checkpoint_config_path, args.lighteval_override, args.cache_dir)
57+
return
58+
59+
if args.list_tasks:
60+
Registry(cache_dir="").print_all_tasks()
61+
return
62+
63+
64+
if __name__ == "__main__":
65+
cli_evaluate()

0 commit comments

Comments
 (0)