[FT] Support GGUF in vllm, and use HF tokenizer together

## Issue encountered
Currently, vllm support load GGUF model directly, and in their [document](https://docs.vllm.ai/en/v0.9.0/features/quantization/gguf.html), they suggest to use the tokenizer from huggingface, which seems not been supported in the current lighteval.

<img width="964" height="165" alt="Image" src="https://github.com/user-attachments/assets/df661106-31a9-4a0e-8096-bf79db680b24" />

## Solution/Feature
I want to use lighteval to evaluate GGUF model and use HF tokenizer rather than the tokenizer embedded in the GGUF model.

To support this feature, we can modify the file `src/lighteval/models/vllm/vllm_model.py`

First, add the tokenizer_name after model_name in the [Line#147](https://github.com/huggingface/lighteval/blob/ee31223d43247a79fe974fd3cb70e8769d886828/src/lighteval/models/vllm/vllm_model.py#L147)
```
tokenizer_name: str | None = None
```

Second, modify the _create_tokenizer in the [Line#293](https://github.com/huggingface/lighteval/blob/ee31223d43247a79fe974fd3cb70e8769d886828/src/lighteval/models/vllm/vllm_model.py#L293)
```
tokenizer = get_tokenizer(
            config.tokenizer_name if config.tokenizer_name else config.model_name,
            tokenizer_mode="auto",
            trust_remote_code=config.trust_remote_code,
            revision=config.revision,
        )
```

Third, move the [self.use_chat_template = uses_chat_template( ... )](https://github.com/huggingface/lighteval/blob/ee31223d43247a79fe974fd3cb70e8769d886828/src/lighteval/models/vllm/vllm_model.py#L181) after `self._tokenizer = self._create_auto_tokenizer(config)` as
```
self._tokenizer = self._create_auto_tokenizer(config)
self.use_chat_template = uses_chat_template(
            model_name=config.model_name, tokenizer=self.tokenizer, override_chat_template=config.override_chat_template
        )
```

## For some known issues
1. We need to modify the log, because it says it is an error when loading the GGUF

<img width="1038" height="99" alt="Image" src="https://github.com/user-attachments/assets/ac806928-3847-4b1c-93da-36fa0f3a9e90" />

2. 
Currently, vLLM doesn't support full-precision GGUF, which means the GGUF in fp16, bf16, and fp32 can not been successfully loaded in vLLM, but the community developers are dealing with this [issue](https://github.com/vllm-project/vllm/pull/23188).  I think this problem will be resolved in the next released of vLLM.

## Possible alternatives
A clear and concise description of any alternative solutions or features you've considered.
None, unknown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Support GGUF in vllm, and use HF tokenizer together #943

Issue encountered

Solution/Feature

For some known issues

Possible alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Support GGUF in vllm, and use HF tokenizer together #943

Description

Issue encountered

Solution/Feature

For some known issues

Possible alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions