-
Notifications
You must be signed in to change notification settings - Fork 362
Description
Issue encountered
Currently, vllm support load GGUF model directly, and in their document, they suggest to use the tokenizer from huggingface, which seems not been supported in the current lighteval.

Solution/Feature
I want to use lighteval to evaluate GGUF model and use HF tokenizer rather than the tokenizer embedded in the GGUF model.
To support this feature, we can modify the file src/lighteval/models/vllm/vllm_model.py
First, add the tokenizer_name after model_name in the Line#147
tokenizer_name: str | None = None
Second, modify the _create_tokenizer in the Line#293
tokenizer = get_tokenizer(
config.tokenizer_name if config.tokenizer_name else config.model_name,
tokenizer_mode="auto",
trust_remote_code=config.trust_remote_code,
revision=config.revision,
)
Third, move the self.use_chat_template = uses_chat_template( ... ) after self._tokenizer = self._create_auto_tokenizer(config)
as
self._tokenizer = self._create_auto_tokenizer(config)
self.use_chat_template = uses_chat_template(
model_name=config.model_name, tokenizer=self.tokenizer, override_chat_template=config.override_chat_template
)
For some known issues
- We need to modify the log, because it says it is an error when loading the GGUF

Currently, vLLM doesn't support full-precision GGUF, which means the GGUF in fp16, bf16, and fp32 can not been successfully loaded in vLLM, but the community developers are dealing with this issue. I think this problem will be resolved in the next released of vLLM.
Possible alternatives
A clear and concise description of any alternative solutions or features you've considered.
None, unknown.