Skip to content

[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970

Closed
nussejzz wants to merge 4 commits intovllm-project:mainfrom
nussejzz:patch-2
Closed

[Bugfix] Update run_single_prompt.sh for offline_inference\bagel#970
nussejzz wants to merge 4 commits intovllm-project:mainfrom
nussejzz:patch-2

Conversation

@nussejzz
Copy link
Copy Markdown
Contributor

Purpose

Update run_single_prompt.sh to use text2text modality and quote shell variable to prevent word splitting in bagel example

  1. The prompt was split into multiple requests because it was not enclosed in quotes.

  2. This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image.

  3. For the users who don't own more than one card and those don't know how to configure the .yaml file, this change will help them run the run_single_prompt.sh, because the text2text task is easiest and only need the Thinker model (stage 0).

Problem

The prompt was split into multiple requests because it was not enclosed in quotes.

[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_0c488baf-f10f-446d-9cd3-ec93d88c5c39, prompt='<|im_start|>user\n<|im_start|>user\\n**What**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text="Hello! How can I assist you today? What would you like to do?\n\nPlease let me know what your question or task is, and I'll do my best to help. Here are some ideas to get us started:\n\n1. **Ask a Question**: You can ask me anything - from general knowledge to specific queries about a topic.\n2. **Get Help with a Task**: If you have a project or task that needs assistance (e.g., writing, editing, explaining a concept), feel free to share it!\n3. **Play Games**: We can play simple text-based games like Hangman, 20 Questions, Word Chain, or even create a story together.\n4. **Learn Something New**: Would you like to learn about a new topic, such as history, science, technology, or more?\n5. **Creative Project**: Want to write a short story, poem, or generate ideas for a project?\n6. **Something Else**: Please specify if none of the above options suit your needs.\n\nLet's find out how I can help you!", token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 3555, 1035, 498, 1075, 311, 653, 1939, 5501, 1077, 752, 1414, 1128, 697, 3405, 476, 3383, 374, 11, 323, 358, 3278, 653, 847, 1850, 311, 1492, 13, 5692, 525, 1045, 6708, 311, 633, 601, 3855, 1447, 16, 13, 3070, 26172, 264, 15846, 95518, 1446, 646, 2548, 752, 4113, 481, 504, 4586, 6540, 311, 3151, 19556, 911, 264, 8544, 624, 17, 13, 3070, 1949, 11479, 448, 264, 5430, 95518, 1416, 498, 614, 264, 2390, 476, 3383, 429, 3880, 12994, 320, 68, 1302, 2572, 4378, 11, 15664, 11, 25021, 264, 7286, 701, 2666, 1910, 311, 4332, 432, 4894, 18, 13, 3070, 9137, 11610, 95518, 1205, 646, 1486, 4285, 1467, 5980, 3868, 1075, 40775, 1515, 11, 220, 17, 15, 23382, 11, 9322, 28525, 11, 476, 1496, 1855, 264, 3364, 3786, 624, 19, 13, 3070, 23824, 24656, 1532, 95518, 18885, 498, 1075, 311, 3960, 911, 264, 501, 8544, 11, 1741, 438, 3840, 11, 8038, 11, 5440, 11, 476, 803, 5267, 20, 13, 3070, 62946, 5787, 95518, 23252, 311, 3270, 264, 2805, 3364, 11, 32794, 11, 476, 6923, 6708, 369, 264, 2390, 5267, 21, 13, 3070, 23087, 18804, 95518, 5209, 13837, 421, 6857, 315, 279, 3403, 2606, 7781, 697, 3880, 382, 10061, 594, 1477, 700, 1246, 358, 646, 1492, 498, 0, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=1_9d53bbce-e87d-4b40-b016-854274096901, prompt='<|im_start|>user\n**is**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 285, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='I am an AI language model, designed to assist and provide information. How can I help you today?', token_ids=[40, 1079, 458, 15235, 4128, 1614, 11, 6188, 311, 7789, 323, 3410, 1995, 13, 2585, 646, 358, 1492, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=2_6b44b6e9-a8e8-4ea3-b0a2-fc0e74723c13, prompt='<|im_start|>user\n**the**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1782, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Hello! How can I assist you today?', token_ids=[9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=3_2fb0a712-77cd-47d6-9887-ba23bf153c2e, prompt='<|im_start|>user\n**capital**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 65063, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='What is the capital of the United States?', token_ids=[3838, 374, 279, 6722, 315, 279, 3639, 4180, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=4_c1cfb2e4-4016-47eb-8bc2-81d78c046aca, prompt='<|im_start|>user\n**of**<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 1055, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='of', token_ids=[1055, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={}), OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=5_3311e1fe-5fd7-4e6a-b430-22947700496b, prompt='<|im_start|>user\n**France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 49000, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text='Yes, France. What can I help you with?', token_ids=[9454, 11, 9625, 13, 3555, 646, 358, 1492, 498, 448, 30, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]

Test Plan

Update the code and run it again
cd examples/offline_inference/bagel
bash run_single_prompt.sh

Test Result

Now the prompt returns to normal

[OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='text', request_output=[RequestOutput(request_id=0_30428837-5f61-4322-8074-029872c02736, prompt='<|im_start|>user\n<|im_start|>user\\n**What is the capital of France?**<|im_end|>\\n<|im_start|>assistant\\n<|im_end|>\n<|im_start|>assistant\n', prompt_token_ids=[151644, 872, 198, 151644, 872, 1699, 3838, 374, 279, 6722, 315, 9625, 30, 151645, 1699, 151644, 77091, 1699, 151645, 198, 151644, 77091, 198], encoder_prompt=None, encoder_prompt_token_ids=None, prompt_logprobs=None, outputs=[CompletionOutput(index=0, text=**'The capital of France is Paris.'**, token_ids=[785, 6722, 315, 9625, 374, 12095, 13, 151645], routed_experts=None, cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)], finished=True, metrics=None, lora_request=None, num_cached_tokens=0, multi_modal_placeholders={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".

  • The test plan, such as providing test command.

  • The test results, such as pasting the results comparison before and after, or e2e results

  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…ote shell variable to prevent word splitting in bagel example

1. The prompt was split into multiple requests because it was not enclosed in quotes.
2. This prompt is used to demonstrate the Text2Text mode, while the default mode is Text2Image.

Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@princepride PTAL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the wide variety of tasks Bagel supports, the numerous deployment options, and the future need to support the thinking mode, I feel it's inappropriate to release a bash script that only supports a single task. I suggest that all usage details be completed through the readme. @nussejzz What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! In fact, I've written all the Python client commands for all the tasks in the offline readme, as shown in the image. But I'd love to try and add a few more bash scripts.
wechat_longscreenshot_2026-01-27_110715_047

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove single prompt part first because we don't support multi-prompt for dit now, btw have you add the usage of mooncake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ok, I will add the corresponding .sh script under each task and remove single prompt part.
  2. I only have one node, so I haven't actually tested the usage of mooncake yet. I plan to check the version you wrote in your previous PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One device is enough to test mooncake, you can deploy mooncake and Omni on the same device.

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already support --modalities to make a model output certain modality for different requests. Please refer to examples in Qwen3-Omni to modify. https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni#modality-control

@nussejzz
Copy link
Copy Markdown
Contributor Author

Thanks for your review!
I have solved this problem through another pr #987
So I will close this one.
After I add the cfg parallel feature, I will add comprehensive request script! @Gaohan123

@nussejzz
Copy link
Copy Markdown
Contributor Author

I have solved this problem through another pr #987
So I will close this one.
After I add the cfg parallel feature, I will add comprehensive request script!

@nussejzz nussejzz closed this Jan 31, 2026
@nussejzz nussejzz deleted the patch-2 branch January 31, 2026 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants