[BugFix] Add bagel text2text/img2text think mode support#2503
Conversation
Signed-off-by: princepride <wangzhipeng628@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2dbf218631
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review Summary
Two blocking issues found.
What I validated
- All CI gates pass (DCO, pre-commit, build, docs) ✅
- Test evidence shows think mode working for text2text ✅
VLM_THINK_SYSTEM_PROMPTis a clean analog ofGEN_THINK_SYSTEM_PROMPT✅- Output handling refactor is a reasonable cleanup ✅
Blocking Issues
1. Prompt format drops ChatML roles, breaking non-think text2text
The original text2text format uses ChatML roles: <|im_start|>user\n{p}<|im_end|>\n<|im_start|>assistant\n. The new code produces <|im_start|>{p}<|im_end|><|im_start|> — missing user\n, assistant\n, and \n separators. This changes the prompt even when args.think is False (regression).
For the think case, the system prompt also lacks a system\n role. Compare with img2text which uses <|im_start|>user\n...<|im_end|>\n<|im_start|>assistant\n — text2text should follow the same ChatML convention:
think_prefix = f"<|im_start|>system\n{VLM_THINK_SYSTEM_PROMPT}<|im_end|>\n" if args.think else ""
final_prompt_text = f"{think_prefix}<|im_start|>user\n{p}<|im_end|>\n<|im_start|>assistant\n"2. Title claims img2text support but img2text branch is unchanged
The PR title says "text2text/img2text think mode support" and the description mentions both, but the img2text branch in end2end.py is not modified. Either update the title/scope to match, or add think mode for img2text.
Non-blocking
- The output handling refactor (lines 224-228) changes think-mode output printing for all modalities, including text2img. The test evidence only covers text2text — worth a quick sanity check that text2img think mode still prints correctly.
|
I've add img2text think mode |
Signed-off-by: princepride <wangzhipeng628@gmail.com>
hsliuustc0106
left a comment
There was a problem hiding this comment.
text2text path missing system/user/assistant role markers in prompt format
|
@hsliuustc0106 Can you help approve it? |
…t#2503) Signed-off-by: princepride <wangzhipeng628@gmail.com>
…t#2503) Signed-off-by: princepride <wangzhipeng628@gmail.com>
…t#2503) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: bob-021206 <binyan_github@163.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Because previous code lack system prompt for thinking mode of text2text and img2text task, so the model can't generate thinking progress.
Test Plan
Test Result
**before: **
**after: **
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)