Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vllm_worker support for lora_modules #3534

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Sep 24, 2024

  1. ## Add vllm_worker support for lora_modules

    ## usage
    
    ### start
    
    ```bash
    export VLLM_WORKER_MULTIPROC_METHOD=spawn
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m fastchat.serve.vllm_worker \
        --model-path /data/models/Qwen/Qwen2-72B-Instruct \
        --tokenizer /data/dpo/lora/b15s1/saves/Qwen2-72B-Instruct/v7.9/v7.3 \
        --enable-lora \
        --lora-modules m1=/data/modules/lora/adapter/m1 m2=/data/modules/lora/adapter/m2 m3=/data/modules/lora/adapter/m3 \
        --model-names qwen2-72b-instruct,m1,m2,m3\
        --controller http://localhost:21001 \
        --host 0.0.0.0 \
        --num-gpus 8 \
        --port 31034 \
        --limit-worker-concurrency 100 \
        --worker-address http://localhost:31034
    ```
    
    ### post
    
    - example1
    
    ```bash
    curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-xxx' \
    --data-raw '{
        "model": "m1",
        "stream": false,
        "temperature": 0.7,
        "top_p": 0.1,
        "max_tokens": 4096,
        "messages": [
          {
            "role": "user",
            "content": "Hi?"
          }
        ]
      }'
    ```
    - example2
    
    ```bash
    curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-xxx' \
    --data-raw '{
        "model": "qwen2-72b-instruct",
        "stream": false,
        "temperature": 0.7,
        "top_p": 0.1,
        "max_tokens": 4096,
        "messages": [
          {
            "role": "user",
            "content": "Hi?"
          }
        ]
      }'
    ```
    x22x22 committed Sep 24, 2024
    Configuration menu
    Copy the full SHA
    eca0f15 View commit details
    Browse the repository at this point in the history
  2. ## Add vllm_worker support for lora_modules

    ## usage
    
    ### start
    
    ```bash
    export VLLM_WORKER_MULTIPROC_METHOD=spawn
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m fastchat.serve.vllm_worker \
        --model-path /data/models/Qwen/Qwen2-72B-Instruct \
        --tokenizer /data/dpo/lora/b15s1/saves/Qwen2-72B-Instruct/v7.9/v7.3 \
        --enable-lora \
        --lora-modules m1=/data/modules/lora/adapter/m1 m2=/data/modules/lora/adapter/m2 m3=/data/modules/lora/adapter/m3 \
        --model-names qwen2-72b-instruct,m1,m2,m3\
        --controller http://localhost:21001 \
        --host 0.0.0.0 \
        --num-gpus 8 \
        --port 31034 \
        --limit-worker-concurrency 100 \
        --worker-address http://localhost:31034
    ```
    
    ### post
    
    - example1
    
    ```bash
    curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-xxx' \
    --data-raw '{
        "model": "m1",
        "stream": false,
        "temperature": 0.7,
        "top_p": 0.1,
        "max_tokens": 4096,
        "messages": [
          {
            "role": "user",
            "content": "Hi?"
          }
        ]
      }'
    ```
    - example2
    
    ```bash
    curl --location --request POST 'http://llm-gw.sunlinecloud.cn/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-xxx' \
    --data-raw '{
        "model": "qwen2-72b-instruct",
        "stream": false,
        "temperature": 0.7,
        "top_p": 0.1,
        "max_tokens": 4096,
        "messages": [
          {
            "role": "user",
            "content": "Hi?"
          }
        ]
      }'
    ```
    x22x22 committed Sep 24, 2024
    Configuration menu
    Copy the full SHA
    2f90685 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2024

  1. add doc

    x22x22 committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    d36dc74 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2024

  1. Configuration menu
    Copy the full SHA
    4591d5e View commit details
    Browse the repository at this point in the history