Add vllm_worker support for lora_modules #3534

## usage ### start ```bash export VLLM_WORKER_MULTIPROC_METHOD=spawn CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m fastchat.serve.vllm_worker \ --model-path /data/models/Qwen/Qwen2-72B-Instruct \ --tokenizer /data/dpo/lora/b15s1/saves/Qwen2-72B-Instruct/v7.9/v7.3 \ --enable-lora \ --lora-modules m1=/data/modules/lora/adapter/m1 m2=/data/modules/lora/adapter/m2 m3=/data/modules/lora/adapter/m3 \ --model-names qwen2-72b-instruct,m1,m2,m3\ --controller http://localhost:21001 \ --host 0.0.0.0 \ --num-gpus 8 \ --port 31034 \ --limit-worker-concurrency 100 \ --worker-address http://localhost:31034 ``` ### post - example1 ```bash curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-xxx' \ --data-raw '{ "model": "m1", "stream": false, "temperature": 0.7, "top_p": 0.1, "max_tokens": 4096, "messages": [ { "role": "user", "content": "Hi?" } ] }' ``` - example2 ```bash curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer sk-xxx' \ --data-raw '{ "model": "qwen2-72b-instruct", "stream": false, "temperature": 0.7, "top_p": 0.1, "max_tokens": 4096, "messages": [ { "role": "user", "content": "Hi?" } ] }' ```

Commits on Sep 27, 2024

add doc

x22x22 committed Sep 27, 2024

Configuration menu

View commit details

Copy full SHA for d36dc74

Browse repository at this point

Copy the full SHA

d36dc74 View commit details

Browse the repository at this point in the history

Commits on Oct 11, 2024

fix lora_request variable is not declared

x22x22 committed Oct 11, 2024

Configuration menu

View commit details

Copy full SHA for 4591d5e

Browse repository at this point

Copy the full SHA

4591d5e View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vllm_worker support for lora_modules #3534

Add vllm_worker support for lora_modules #3534

Commits on Sep 24, 2024

Commits on Sep 27, 2024

Commits on Oct 11, 2024

Add vllm_worker support for lora_modules #3534

Are you sure you want to change the base?

Add vllm_worker support for lora_modules #3534

Commits on Sep 24, 2024

Commits on Sep 27, 2024

Commits on Oct 11, 2024