[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970
[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970nussejzz wants to merge 4 commits intovllm-project:mainfrom
Conversation
…ote shell variable to prevent word splitting in bagel example 1. The prompt was split into multiple requests because it was not enclosed in quotes. 2. This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image. Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
|
@princepride PTAL |
There was a problem hiding this comment.
Considering the wide variety of tasks Bagel supports, the numerous deployment options, and the future need to support the thinking mode, I feel it's inappropriate to release a bash script that only supports a single task. I suggest that all usage details be completed through the readme. @nussejzz What do you think?
There was a problem hiding this comment.
I think we can remove single prompt part first because we don't support multi-prompt for dit now, btw have you add the usage of mooncake?
There was a problem hiding this comment.
- Ok, I will add the corresponding .sh script under each task and remove single prompt part.
- I only have one node, so I haven't actually tested the usage of mooncake yet. I plan to check the version you wrote in your previous PR.
There was a problem hiding this comment.
One device is enough to test mooncake, you can deploy mooncake and Omni on the same device.
Gaohan123
left a comment
There was a problem hiding this comment.
We already support --modalities to make a model output certain modality for different requests. Please refer to examples in Qwen3-Omni to modify. https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni#modality-control
|
Thanks for your review! |
|
I have solved this problem through another pr #987 |

Purpose
Update run_single_prompt.sh to use text2text modality and quote shell variable to prevent word splitting in bagel example
The prompt was split into multiple requests because it was not enclosed in quotes.
This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image.
For the users who don't own more than one card and those don't know how to configure the .yaml file, this change will help them run the run_single_prompt.sh, because the text2text task is easiest and only need the Thinker model (stage 0).
Problem
The prompt was split into multiple requests because it was not enclosed in quotes.
[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_0c488baf-f10f-446d-9cd3-ec93d88c5c39, prompt='<|im_start|>user\n<|im_start|>user\\n**What**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text="Hello! How can I assist you today? What would you like to do?\n\nPlease let me know what your question or task is, and I'll do my best to help. Here are some ideas to get us started:\n\n1. **Ask a Question**: You can ask me anything - from general knowledge to specific queries about a topic.\n2. **Get Help with a Task**: If you have a project or task that needs assistance (e.g., writing, editing, explaining a concept), feel free to share it!\n3. **Play Games**: We can play simple text-based games like Hangman, 20 Questions, Word Chain, or even create a story together.\n4. **Learn Something New**: Would you like to learn about a new topic, such as history, science, technology, or more?\n5. **Creative Project**: Want to write a short story, poem, or generate ideas for a project?\n6. **Something Else**: Please specify if none of the above options suit your needs.\n\nLet's find out how I can help you!", token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 3555, 1035, 498, 1075, 311, 653, 1939, 5501, 1077, 752, 1414, 1128, 697, 3405, 476, 3383, 374, 11, 323, 358, 3278, 653, 847, 1850, 311, 1492, 13, 5692, 525, 1045, 6708, 311, 633, 601, 3855, 1447, 16, 13, 3070, 26172, 264, 15846, 95518, 1446, 646, 2548, 752, 4113, 481, 504, 4586, 6540, 311, 3151, 19556, 911, 264, 8544, 624, 17, 13, 3070, 1949, 11479, 448, 264, 5430, 95518, 1416, 498, 614, 264, 2390, 476, 3383, 429, 3880, 12994, 320, 68, 1302, 2572, 4378, 11, 15664, 11, 25021, 264, 7286, 701, 2666, 1910, 311, 4332, 432, 4894, 18, 13, 3070, 9137, 11610, 95518, 1205, 646, 1486, 4285, 1467, 5980, 3868, 1075, 40775, 1515, 11, 220, 17, 15, 23382, 11, 9322, 28525, 11, 476, 1496, 1855, 264, 3364, 3786, 624, 19, 13, 3070, 23824, 24656, 1532, 95518, 18885, 498, 1075, 311, 3960, 911, 264, 501, 8544, 11, 1741, 438, 3840, 11, 8038, 11, 5440, 11, 476, 803, 5267, 20, 13, 3070, 62946, 5787, 95518, 23252, 311, 3270, 264, 2805, 3364, 11, 32794, 11, 476, 6923, 6708, 369, 264, 2390, 5267, 21, 13, 3070, 23087, 18804, 95518, 5209, 13837, 421, 6857, 315, 279, 3403, 2606, 7781, 697, 3880, 382, 10061, 594, 1477, 700, 1246, 358, 646, 1492, 498, 0, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=1_9d53bbce-e87d-4b40-b016-854274096901, prompt='<|im_start|>user\n**is**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 285, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='I am an AI language model, designed to assist and provide information. How can I help you today?', token_ids=[40, 1079, 458, 15235, 4128, 1614, 11, 6188, 311, 7789, 323, 3410, 1995, 13, 2585, 646, 358, 1492, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=2_6b44b6e9-a8e8-4ea3-b0a2-fc0e74723c13, prompt='<|im_start|>user\n**the**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1782, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Hello! How can I assist you today?', token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=3_2fb0a712-77cd-47d6-9887-ba23bf153c2e, prompt='<|im_start|>user\n**capital**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 65063, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='What is the capital of the United States?', token_ids=[3838, 374, 279, 6722, 315, 279, 3639, 4180, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=4_c1cfb2e4-4016-47eb-8bc2-81d78c046aca, prompt='<|im_start|>user\n**of**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1055, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='of', token_ids=[1055, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=5_3311e1fe-5fd7-4e6a-b430-22947700496b, prompt='<|im_start|>user\n**France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 49000, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Yes, France. What can I help you with?', token_ids=[9454, 11, 9625, 13, 3555, 646, 358, 1492, 498, 448, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]Test Plan
Update the code and run it again
cd examples/offline_inference/bagelbash run_single_prompt.shTest Result
Now the prompt returns to normal
[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_30428837-5f61-4322-8074-029872c02736, prompt='<|im_start|>user\n<|im_start|>user\\n**What is the capital of France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 374, 279, 6722, 315, 9625, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text=**'The capital of France is Paris.'**, token_ids=[785, 6722, 315, 9625, 374, 12095, 13, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]Essential Elements of an Effective PR Description Checklist
The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating
supported_models.mdandexamplesfor a new model.(Optional) Release notes update. If your change is user facing, please update the release notes draft.
BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)