Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,18 @@ accelerate launch --multi_gpu --num_processes=<num_gpus> -m \

You can find the template of the expected model configuration in [examples/model_configs/base_model.yaml_](./examples/model_configs/base_model.yaml).

### Evaluating a quantized model

If you want to evaluate a model by quantizing it, then the model can be loaded in `4bit` or `8bit`. Implicitly, this makes use of `BitsAndBytesConfig` and can drastically reduce memory requirements for consumer-grade hardware.

An example configuration can be found in [examples/model_configs/quantized_model.yaml](./examples/model_configs/quantized_model.yaml).

### Evaluating a PEFT model

If you want to evaluate a model trained with `peft`, check out [examples/model_configs/peft_model.yaml](./examples/model_configs/peft_model.yaml).

Currently, `lighteval` supports `adapter` and `delta` weights to be applied to the base model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a code snippet t show how to use those configs would be great :)


### Evaluating a large model with pipeline parallelism

To evaluate models larger that ~40B parameters in 16-bit precision, you will need to shard the model across multiple GPUs to fit it in VRAM. You can do this by passing `model_parallel=True` and adapting `--num_processes` to be the number of processes to use for data parallel. For example, on a single node of 8 GPUs, you can run:
Expand Down Expand Up @@ -480,6 +492,12 @@ export CUDA_LAUNCH_BLOCKING=1
srun accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
```

## Authentication

For authentication of HuggingFace models (i.e `base` models), a HuggingFace token is used. The `HF_TOKEN` used is picked up directly from the environment.

For `tgi` models, authentication is provided in the config file. An example can be found at [tgi_model.yaml](./examples/model_configs/tgi_model.yaml).

## Releases

### Building the package
Expand All @@ -498,4 +516,4 @@ python3 -m build .
version = {0.3.0},
url = {https://github.com/huggingface/lighteval}
}
```
```
12 changes: 12 additions & 0 deletions examples/model_configs/peft_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
model:
type: "base"
base_params:
model_args: "pretrained=predibase/customer_support,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ... For a PEFT model, the pretrained model should be the one trained with PEFT and the base model below will contain the original model on which the adapters will be applied.
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
compile: true
merged_weights: # Ignore this section if you are not using PEFT models
delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name
adapter_weights: true # set to True of your model has been trained with peft, also need to provide the base model name
base_model: "mistralai/Mistral-7B-v0.1" # path to the base_model - needs to be specified only if delta_weights or adapter_weights is set to True
generation:
multichoice_continuations_start_space: null # If true/false, will force multiple choice continuations to start/not start with a space. If none, will do nothing
12 changes: 12 additions & 0 deletions examples/model_configs/quantized_model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
model:
type: "base"
base_params:
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
compile: true
merged_weights: # Ignore this section if you are not using PEFT models
delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name
adapter_weights: false # set to True of your model has been trained with peft, also need to provide the base model name
base_model: null # path to the base_model - needs to be specified only if delta_weights or adapter_weights is set to True
generation:
multichoice_continuations_start_space: null # If true/false, will force multiple choice continuations to start/not start with a space. If none, will do nothing
Loading