@@ -29,7 +29,44 @@ This section will hold all the updates that have taken place since the blog post
2929vLLM with the transformers backend now supports ** Vision Language Models** . When user adds ` model_impl="transformers" ` ,
3030the correct class for text-only and multimodality will be deduced and loaded.
3131
32- Here is how one would use the API.
32+ Here is how one can serve a multimodal model using the transformers backend.
33+ ``` bash
34+ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf \
35+ --model_impl transformers \
36+ --disable-mm-preprocessor-cache \
37+ --no-enable-prefix-caching \
38+ --no-enable-chunked-prefill
39+ ```
40+
41+ To consume the model one can use the ` openai ` API like so:
42+ ``` python
43+ from openai import OpenAI
44+ openai_api_key = " EMPTY"
45+ openai_api_base = " http://localhost:8000/v1"
46+ client = OpenAI(
47+ api_key = openai_api_key,
48+ base_url = openai_api_base,
49+ )
50+ chat_response = client.chat.completions.create(
51+ model = " llava-hf/llava-onevision-qwen2-0.5b-ov-hf" ,
52+ messages = [{
53+ " role" : " user" ,
54+ " content" : [
55+ {" type" : " text" , " text" : " What's in this image?" },
56+ {
57+ " type" : " image_url" ,
58+ " image_url" : {
59+ " url" : " http://images.cocodataset.org/val2017/000000039769.jpg" ,
60+ },
61+ },
62+ ],
63+ }],
64+ )
65+ print (" Chat response:" , chat_response)
66+ ```
67+
68+ You can also directly initialize the vLLM engine using the ` LLM ` API. Here is the same model being
69+ served using the ` LLM ` API.
3370
3471``` python
3572from vllm import LLM , SamplingParams
0 commit comments