-
Couldn't load subscription status.
- Fork 539
Adds vLLM as Option for Local App #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
7dbd31d
32560d8
f97747a
82233de
2123430
43de0e9
8557110
2c3c6c2
17ad182
2bb1bc1
d63b7cb
6fd56ef
0308303
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -63,6 +63,41 @@ LLAMA_CURL=1 make | |||
| ]; | ||||
| }; | ||||
|
|
||||
| const snippetVllm = (model: ModelData): string[] => { | ||||
| return [ | ||||
| ` | ||||
| ## Deploy with docker (needs Docker installed) a gated model (please, request access in Hugginface's model repo): | ||||
| docker run --runtime nvidia --gpus all \ | ||||
| --name my_vllm_container \ | ||||
| -v ~/.cache/huggingface:/root/.cache/huggingface \ | ||||
| --env "HUGGING_FACE_HUB_TOKEN=<secret>" \ | ||||
| -p 8000:8000 \ | ||||
| --ipc=host \ | ||||
| vllm/vllm-openai:latest \ | ||||
| --model mistralai/Mistral-7B-Instruct-v0.1 | ||||
| `, | ||||
| ` | ||||
| ## Load and run the model | ||||
| docker exec -it my_vllm_container bash -c "python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.1 --dtype auto --api-key token-abc123" | ||||
EliMCosta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||
| `, | ||||
| ` | ||||
| ## Call the server using the official OpenAI Python client library, or any other HTTP client | ||||
| from openai import OpenAI | ||||
| client = OpenAI( | ||||
| base_url="http://localhost:8000/v1", | ||||
| api_key="token-abc123", | ||||
| ) | ||||
| completion = client.chat.completions.create( | ||||
| model="mistralai/Mistral-7B-Instruct-v0.1", | ||||
|
||||
| messages=[ | ||||
| {"role": "user", "content": "Hello!"} | ||||
| ] | ||||
| ) | ||||
| print(completion.choices[0].message) | ||||
| `, | ||||
| ]; | ||||
| }; | ||||
|
|
||||
| /** | ||||
| * Add your new local app here. | ||||
| * | ||||
|
|
@@ -82,6 +117,13 @@ export const LOCAL_APPS = { | |||
| displayOnModelPage: isGgufModel, | ||||
| snippet: snippetLlamacpp, | ||||
| }, | ||||
| "vllm": { | ||||
| prettyLabel: "vLLM", | ||||
| docsUrl: "https://docs.vllm.ai", | ||||
| mainTask: "text-generation", | ||||
| displayOnModelPage: isGptqModel && isAwqModel, | ||||
|
||||
| architectures?: string[]; |
And for quantization method we can read in config.quantization_config.quant_method which we support awq, gptq, aqlm, and marlin
https://huggingface.co/TheBloke/zephyr-7B-alpha-AWQ/blob/main/config.json#L28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome @simon-mo, super clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've pushed 2123430 on this PR to type config.quantization_config.quant_method which we now parse & pass from the Hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've pushed 2123430 on this PR to type
config.quantization_config.quant_methodwhich we now parse & pass from the Hub
I made some changes, I need your help to review
Uh oh!
There was an error while loading. Please reload this page.