Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 21 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ pre-commit install

We provide two main entry points to evaluate models:

* `run_evals_accelerate.py`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
* `run_evals_nanotron.py`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).
* `lighteval accelerate`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
* `lighteval nanotron`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).

For most users, we recommend using the 🤗 Accelerate backend - see below for specific commands.

Expand All @@ -94,7 +94,8 @@ accelerate config
You can then evaluate a model using data parallelism as follows:

```shell
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of accelerate ... lighteval accelerate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but not sure how we could really differenciate between nanotron and accelerate launchers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, can you launch nanotron with accelerate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll ever get accelerate launch ... lighteval nanotron

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could assume that if accelerate launch is used, then it's lighteval accelerate?

lighteval accelerate \
--model_args="pretrained=<path to model on the hub>" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -109,7 +110,8 @@ suite|task|num_few_shot|{0 or 1 to automatically reduce `num_few_shot` if prompt
or a file path like [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.txt) which specifies multiple task configurations. For example, to evaluate GPT-2 on the Truthful QA benchmark run:

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "lighteval|truthfulqa:mc|0|0" \
--override_batch_size 1 \
Expand All @@ -119,7 +121,8 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
Here, `--override_batch_size` defines the _batch size per device_, so the effective batch size will be `override_batch_size x num_gpus`. To evaluate on multiple benchmarks, separate each task configuration with a comma, e.g.

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0,leaderboard|gsm8k|0|0" \
--override_batch_size 1 \
Expand All @@ -133,7 +136,8 @@ See the [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.
If you want to evaluate a model by spinning up inference endpoints, use adapter/delta weights, or more complex configuration options, you can load models using a configuration file. This is done as follows:

```shell
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
lighteval accelerate \
--model_config_path="<path to your model configuration>" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -147,13 +151,15 @@ To evaluate models larger that ~40B parameters in 16-bit precision, you will nee

```shell
# PP=2, DP=4 - good for models < 70B params
accelerate launch --multi_gpu --num_processes=4 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=4 -m \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
--tasks <task parameters> \
--output_dir output_dir

# PP=4, DP=2 - good for huge models >= 70B params
accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=2 -m \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -164,7 +170,8 @@ accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
To evaluate a model on all the benchmarks of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) using a single node of 8 GPUs, run:

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=<model name>" \
--tasks examples/tasks/open_llm_leaderboard_tasks.txt \
--override_batch_size 1 \
Expand All @@ -176,7 +183,7 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
You can also use `lighteval` to evaluate models on CPU, although note this will typically be very slow for large models. To do so, run:

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>"\
--tasks <task parameters> \
--output_dir output_dir
Expand Down Expand Up @@ -211,7 +218,7 @@ Independently of the default tasks provided in `lighteval` that you will find in

For example, to run an extended task like `ifeval`, you can run:
```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
--tasks "extended|ifeval|0|0" \
Expand All @@ -221,7 +228,7 @@ python run_evals_accelerate.py \
To run a community or custom task, you can use (note the custom_tasks flag):

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>"\
--tasks <task parameters> \
--custom_tasks <path to your custom or community task> \
Expand All @@ -231,7 +238,7 @@ python run_evals_accelerate.py \
For example, to launch `lighteval` on `arabic_mmlu:abstract_algebra` for `HuggingFaceH4/zephyr-7b-beta`, run:

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
--tasks "community|arabic_mmlu:abstract_algebra|5|1" \
Expand Down Expand Up @@ -464,7 +471,7 @@ source <path_to_your_venv>/activate #or conda activate yourenv
cd <path_to_your_lighteval>/lighteval

export CUDA_LAUNCH_BLOCKING=1
srun accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
srun accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
```

## Releases
Expand Down
2 changes: 1 addition & 1 deletion examples/model_configs/endpoint_model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ model:
endpoint_type: "protected"
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
env_vars:
env_vars:
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
generation:
add_special_tokens: true
2 changes: 1 addition & 1 deletion examples/model_configs/tgi_model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ model:
instance:
inference_server_address: ""
inference_server_auth: null
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ Issues = "https://github.com/huggingface/lighteval/issues"
# Changelog = "https://github.com/huggingface/lighteval/blob/master/CHANGELOG.md"

[project.scripts]
lighteval = "lighteval.commands.lighteval_cli:main"
lighteval = "lighteval.__main__:cli_evaluate"
89 changes: 0 additions & 89 deletions run_evals_accelerate.py

This file was deleted.

55 changes: 0 additions & 55 deletions run_evals_nanotron.py

This file was deleted.

65 changes: 65 additions & 0 deletions src/lighteval/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python

# MIT License

# Copyright (c) 2024 Taratra D. RAHARISON and The HuggingFace Team

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import argparse

from lighteval.parsers import parser_accelerate, parser_nanotron
from lighteval.tasks.registry import Registry


def cli_evaluate():
parser = argparse.ArgumentParser(description="CLI tool for lighteval, a lightweight framework for LLM evaluation")
subparsers = parser.add_subparsers(help="help for subcommand", dest="subcommand")

# create the parser for the "accelerate" command
parser_a = subparsers.add_parser("accelerate", help="use accelerate and transformers as backend for evaluation.")
parser_accelerate(parser_a)

# create the parser for the "nanotron" command
parser_b = subparsers.add_parser("nanotron", help="use nanotron as backend for evaluation.")
parser_nanotron(parser_b)

parser.add_argument("--list-tasks", action="store_true", help="List available tasks")

args = parser.parse_args()

if args.subcommand == "accelerate":
from lighteval.main_accelerate import main as main_accelerate

main_accelerate(args)
return

if args.subcommand == "nanotron":
from lighteval.main_nanotron import main as main_nanotron

main_nanotron(args.checkpoint_config_path, args.lighteval_override, args.cache_dir)
return

if args.list_tasks:
Registry(cache_dir="").print_all_tasks()
return


if __name__ == "__main__":
cli_evaluate()
Loading