-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Your current environment
ok
🐛 Describe the bug
I'm trying to use the plugin lora_filesystem_resolver, so that LoRAs are automatically loaded based on request.
I'm using the following:
export MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct-AWQ docker run \ --ipc=host \ --runtime nvidia \ -e VLLM_USE_V1=1 \ -p "${MODEL_ID_PORT}:8000" \ --env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \ --env "CUDA_VISIBLE_DEVICES=${MODEL_ID_GPU}" \ --env "VLLM_ALLOW_RUNTIME_LORA_UPDATING=True" \ --env "VLLM_PLUGINS=lora_filesystem_resolver" \ --env "VLLM_LORA_RESOLVER_CACHE_DIR=/root/lora" \ -v "VLLM_LOGGING_LEVEL=${VLLM_LOGGING_LEVEL}" \ -v "${HF_HOME}:/root/.cache/huggingface" \ -v "/teamspace/studios/this_studio/lora:/root/lora" \ -v "$(pwd):/app" \ ubicloud/vllm-openai:latest \ --model ${MODEL_ID} \ --quantization awq \ --dtype float16 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --chat-template /vllm-workspace/examples/tool_chat_template_hermes.jinja \ --gpu-memory-utilization 0.29 \ --max-lora-rank 256 \ --enable-lora \ --served-model-name qwen-qwen25-vl-3b-instruct-awq
note: I'm using ubicloud/vllm-openai because vllm/vllm-openai was not loading the plugin.
It seems that the plugin is loaded:
INFO 05-23 10:36:59 [__init__.py:30] Available plugins for group vllm.general_plugins: INFO 05-23 10:36:59 [__init__.py:32] name=lora_filesystem_resolver, value=vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver INFO 05-23 10:36:59 [__init__.py:44] plugin lora_filesystem_resolver loaded.
Besides that, the files from my LoRA are in the right directory:
root@6729z265y9b0:/lora/my_lora# ls adapter_config.json adapter_model.safetensors
What I was expecting is, based on request, in this case requesting "my_lora" model, the resolver would search from that directory and load the LoRA. However, it keeps returning 404:
Request:
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my_lora", "messages": [ {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQyZgfYgSOafCRv2D8mnZda1i4MYGza7qvlKhiPOIrJKdaWCcA&s"} }, {"type": "text", "text": "From which country is this person?"} ]} ] }'
Response:
{"object":"error","message":"The model lora does not exist.","type":"NotFoundError","param":null,"code":404}
Is that something wrong with my code or with my interpretation of what this resolver is suppose to do?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.