Skip to content
2 changes: 1 addition & 1 deletion examples/model_configs/vllm_model_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ model:
top_k: -1
min_p: 0.0
top_p: 0.9
max_new_tokens: 100
max_new_tokens: 256
stop_tokens: ["<EOS>", "<PAD>"]
4 changes: 3 additions & 1 deletion src/lighteval/models/vllm/vllm_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,9 @@ def _generate(
sampling_params = self.sampling_params.clone() or SamplingParams()
if generate:
sampling_params.n = num_samples
sampling_params.max_tokens = max_new_tokens
sampling_params.max_tokens = (
max_new_tokens if sampling_params.max_tokens is None else sampling_params.max_tokens
)
Comment on lines +319 to +321
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM has max_tokens as Optional[int] but defaulting to 16 here
That means whenever sampling_params is created, it assumes the value 16 and hence this sampling_params.max_tokens ends up being always equal to 16
Then the lighteval benchmark goes on to warn that the output is not in the Gold Format ...

sampling_params.stop = stop_tokens
sampling_params.logprobs = 1 if returns_logits else 0

Expand Down