huggingface · NathanHB · Aug 27, 2024 · Aug 12, 2024 · Aug 12, 2024 · Aug 12, 2024
diff --git a/README.md b/README.md
@@ -146,6 +146,18 @@ accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
 
 You can find the template of the expected model configuration in [examples/model_configs/base_model.yaml_](./examples/model_configs/base_model.yaml).
 
+### Evaluating a quantized model
+
+If you want to evaluate a model by quantizing it, then the model can be loaded in `4bit` or `8bit`. Implicitly, this makes use of `BitsAndBytesConfig` and can drastically reduce memory requirements for consumer-grade hardware.
+
+An example configuration can be found in [examples/model_configs/quantized_model.yaml](./examples/model_configs/quantized_model.yaml).
+
+### Evaluating a PEFT model
+
+If you want to evaluate a model trained with `peft`, check out [examples/model_configs/peft_model.yaml](./examples/model_configs/peft_model.yaml).
+
+Currently, `lighteval` supports `adapter` and `delta` weights to be applied to the base model.
+
 ### Evaluating a large model with pipeline parallelism
 
 To evaluate models larger that ~40B parameters in 16-bit precision, you will need to shard the model across multiple GPUs to fit it in VRAM. You can do this by passing `model_parallel=True` and adapting `--num_processes` to be the number of processes to use for data parallel. For example, on a single node of 8 GPUs, you can run:
@@ -480,6 +492,12 @@ export CUDA_LAUNCH_BLOCKING=1
 srun accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
 ```
 
+## Authentication
+
+For authentication of HuggingFace models (i.e `base` models), a HuggingFace token is used. The `HF_TOKEN` used is picked up directly from the environment.
+
+For `tgi` models, authentication is provided in the config file. An example can be found at [tgi_model.yaml](./examples/model_configs/tgi_model.yaml).
+
 ## Releases
 
 ### Building the package
@@ -498,4 +516,4 @@ python3 -m build .
   version = {0.3.0},
   url = {https://github.com/huggingface/lighteval}
 }
-```
+```
diff --git a/examples/model_configs/peft_model.yaml b/examples/model_configs/peft_model.yaml
@@ -0,0 +1,12 @@
+model:
+  type: "base" 
+  base_params:
+    model_args: "pretrained=predibase/customer_support,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ... For a PEFT model, the pretrained model should be the one trained with PEFT and the base model below will contain the original model on which the adapters will be applied.
+    dtype: "4bit"  # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization. 
+    compile: true
+  merged_weights: # Ignore this section if you are not using PEFT models
+    delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name
+    adapter_weights: true # set to True of your model has been trained with peft, also need to provide the base model name
+    base_model: "mistralai/Mistral-7B-v0.1" # path to the base_model - needs to be specified only if delta_weights or adapter_weights is set to True
+  generation:
+    multichoice_continuations_start_space: null # If true/false, will force multiple choice continuations to start/not start with a space. If none, will do nothing
diff --git a/examples/model_configs/quantized_model.yaml b/examples/model_configs/quantized_model.yaml
@@ -0,0 +1,12 @@
+model:
+  type: "base" 
+  base_params:
+    model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
+    dtype: "4bit"  # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization. 
+    compile: true
+  merged_weights: # Ignore this section if you are not using PEFT models
+    delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name
+    adapter_weights: false # set to True of your model has been trained with peft, also need to provide the base model name
+    base_model: null # path to the base_model - needs to be specified only if delta_weights or adapter_weights is set to True
+  generation:
+    multichoice_continuations_start_space: null # If true/false, will force multiple choice continuations to start/not start with a space. If none, will do nothing