[Bugfix] Update run_single_prompt.sh for offline_inference\bagel by nussejzz · Pull Request #970 · vllm-project/vllm-omni

nussejzz · 2026-01-26T17:52:55Z

Purpose

Update run_single_prompt.sh to use text2text modality and quote shell variable to prevent word splitting in bagel example

The prompt was split into multiple requests because it was not enclosed in quotes.
This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image.
For the users who don't own more than one card and those don't know how to configure the .yaml file, this change will help them run the run_single_prompt.sh, because the text2text task is easiest and only need the Thinker model (stage 0).

Problem

The prompt was split into multiple requests because it was not enclosed in quotes.

[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_0c488baf-f10f-446d-9cd3-ec93d88c5c39, prompt='<|im_start|>user\n<|im_start|>user\\n**What**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text="Hello! How can I assist you today? What would you like to do?\n\nPlease let me know what your question or task is, and I'll do my best to help. Here are some ideas to get us started:\n\n1. **Ask a Question**: You can ask me anything - from general knowledge to specific queries about a topic.\n2. **Get Help with a Task**: If you have a project or task that needs assistance (e.g., writing, editing, explaining a concept), feel free to share it!\n3. **Play Games**: We can play simple text-based games like Hangman, 20 Questions, Word Chain, or even create a story together.\n4. **Learn Something New**: Would you like to learn about a new topic, such as history, science, technology, or more?\n5. **Creative Project**: Want to write a short story, poem, or generate ideas for a project?\n6. **Something Else**: Please specify if none of the above options suit your needs.\n\nLet's find out how I can help you!", token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 3555, 1035, 498, 1075, 311, 653, 1939, 5501, 1077, 752, 1414, 1128, 697, 3405, 476, 3383, 374, 11, 323, 358, 3278, 653, 847, 1850, 311, 1492, 13, 5692, 525, 1045, 6708, 311, 633, 601, 3855, 1447, 16, 13, 3070, 26172, 264, 15846, 95518, 1446, 646, 2548, 752, 4113, 481, 504, 4586, 6540, 311, 3151, 19556, 911, 264, 8544, 624, 17, 13, 3070, 1949, 11479, 448, 264, 5430, 95518, 1416, 498, 614, 264, 2390, 476, 3383, 429, 3880, 12994, 320, 68, 1302, 2572, 4378, 11, 15664, 11, 25021, 264, 7286, 701, 2666, 1910, 311, 4332, 432, 4894, 18, 13, 3070, 9137, 11610, 95518, 1205, 646, 1486, 4285, 1467, 5980, 3868, 1075, 40775, 1515, 11, 220, 17, 15, 23382, 11, 9322, 28525, 11, 476, 1496, 1855, 264, 3364, 3786, 624, 19, 13, 3070, 23824, 24656, 1532, 95518, 18885, 498, 1075, 311, 3960, 911, 264, 501, 8544, 11, 1741, 438, 3840, 11, 8038, 11, 5440, 11, 476, 803, 5267, 20, 13, 3070, 62946, 5787, 95518, 23252, 311, 3270, 264, 2805, 3364, 11, 32794, 11, 476, 6923, 6708, 369, 264, 2390, 5267, 21, 13, 3070, 23087, 18804, 95518, 5209, 13837, 421, 6857, 315, 279, 3403, 2606, 7781, 697, 3880, 382, 10061, 594, 1477, 700, 1246, 358, 646, 1492, 498, 0, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=1_9d53bbce-e87d-4b40-b016-854274096901, prompt='<|im_start|>user\n**is**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 285, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='I am an AI language model, designed to assist and provide information. How can I help you today?', token_ids=[40, 1079, 458, 15235, 4128, 1614, 11, 6188, 311, 7789, 323, 3410, 1995, 13, 2585, 646, 358, 1492, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=2_6b44b6e9-a8e8-4ea3-b0a2-fc0e74723c13, prompt='<|im_start|>user\n**the**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1782, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Hello! How can I assist you today?', token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=3_2fb0a712-77cd-47d6-9887-ba23bf153c2e, prompt='<|im_start|>user\n**capital**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 65063, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='What is the capital of the United States?', token_ids=[3838, 374, 279, 6722, 315, 279, 3639, 4180, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=4_c1cfb2e4-4016-47eb-8bc2-81d78c046aca, prompt='<|im_start|>user\n**of**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1055, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='of', token_ids=[1055, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=5_3311e1fe-5fd7-4e6a-b430-22947700496b, prompt='<|im_start|>user\n**France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 49000, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Yes, France. What can I help you with?', token_ids=[9454, 11, 9625, 13, 3555, 646, 358, 1492, 498, 448, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]

Test Plan

Update the code and run it again
cd examples/offline_inference/bagel
bash run_single_prompt.sh

Test Result

Now the prompt returns to normal

[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_30428837-5f61-4322-8074-029872c02736, prompt='<|im_start|>user\n<|im_start|>user\\n**What is the capital of France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 374, 279, 6722, 315, 9625, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text=**'The capital of France is Paris.'**, token_ids=[785, 6722, 315, 9625, 374, 12095, 13, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…ote shell variable to prevent word splitting in bagel example 1. The prompt was split into multiple requests because it was not enclosed in quotes. 2. This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image. Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>

Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>

hsliuustc0106 · 2026-01-27T02:31:26Z

@princepride PTAL

princepride · 2026-01-27T02:50:28Z

Considering the wide variety of tasks Bagel supports, the numerous deployment options, and the future need to support the thinking mode, I feel it's inappropriate to release a bash script that only supports a single task. I suggest that all usage details be completed through the readme. @nussejzz What do you think?

Good idea! In fact, I've written all the Python client commands for all the tasks in the offline readme, as shown in the image. But I'd love to try and add a few more bash scripts.

@princepride

I think we can remove single prompt part first because we don't support multi-prompt for dit now, btw have you add the usage of mooncake?

Ok, I will add the corresponding .sh script under each task and remove single prompt part.

I only have one node, so I haven't actually tested the usage of mooncake yet. I plan to check the version you wrote in your previous PR.

One device is enough to test mooncake, you can deploy mooncake and Omni on the same device.

Gaohan123

We already support --modalities to make a model output certain modality for different requests. Please refer to examples in Qwen3-Omni to modify. https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni#modality-control

nussejzz · 2026-01-31T05:40:04Z

Thanks for your review!
I have solved this problem through another pr #987
So I will close this one.
After I add the cfg parallel feature, I will add comprehensive request script! @Gaohan123

nussejzz · 2026-01-31T05:40:29Z

I have solved this problem through another pr #987
So I will close this one.
After I add the cfg parallel feature, I will add comprehensive request script!

nussejzz requested a review from hsliuustc0106 as a code owner January 26, 2026 17:52

nussejzz added 2 commits January 27, 2026 02:02

Remove unnecessary newline in run_single_prompt.sh

46a1af4

Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>

Fix formatting in run_single_prompt.sh

a94ea42

Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>

Merge branch 'main' into patch-2

cd50a95

princepride requested changes Jan 27, 2026

View reviewed changes

Gaohan123 reviewed Jan 30, 2026

View reviewed changes

nussejzz closed this Jan 31, 2026

nussejzz deleted the patch-2 branch January 31, 2026 05:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970

[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970
nussejzz wants to merge 4 commits intovllm-project:mainfrom
nussejzz:patch-2

nussejzz commented Jan 26, 2026

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Uh oh!

princepride Jan 27, 2026

Uh oh!

nussejzz Jan 27, 2026

Uh oh!

nussejzz Jan 27, 2026

Uh oh!

princepride Jan 27, 2026

Uh oh!

nussejzz Jan 27, 2026

Uh oh!

princepride Jan 27, 2026

Uh oh!

Gaohan123 left a comment

Uh oh!

nussejzz commented Jan 31, 2026

Uh oh!

nussejzz commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nussejzz commented Jan 26, 2026

Purpose

Problem

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Uh oh!

princepride Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

nussejzz Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

nussejzz Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

princepride Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

nussejzz Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

princepride Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

nussejzz commented Jan 31, 2026

Uh oh!

nussejzz commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants