fix: Properly handle multiple text components from request#5196
Conversation
WalkthroughModified user prompt extraction logic in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @examples/multimodal/components/processor.py:
- Around line 207-230: The current extraction appends empty strings from
message.content into user_texts so user_text may become empty or only newlines;
update the collection in the loop over raw_request.messages/message.content so
you only append item.text if item.type == "text" and item.text.strip() is
non-empty (i.e., skip whitespace-only strings), ensure you only append joined
text_parts to user_texts when text_parts contains at least one non-empty part,
and after building user_text (from "\n".join(user_texts)) validate that
user_text.strip() is non-empty and raise the existing ValueError if not;
reference symbols: raw_request.messages, message.content, item.type, item.text,
user_texts, user_text.
🧹 Nitpick comments (1)
examples/multimodal/components/processor.py (1)
227-227: Optional: Consider extracting error message to a constant.The static analysis tool suggests avoiding long messages directly in exception raises. While this is a minor style issue, you could optionally extract it to improve maintainability:
🔎 Proposed refactor
Near the top of the class or module:
_ERR_NO_USER_TEXT = "No text content found in user messages"Then at line 227:
- raise ValueError("No text content found in user messages") + raise ValueError(_ERR_NO_USER_TEXT)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/multimodal/components/processor.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
📚 Learning: 2026-01-03T01:29:46.547Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-214
Timestamp: 2026-01-03T01:29:46.547Z
Learning: In examples/multimodal/components/processor.py, the processor currently supports only a single text field per request. Do not add support for multiple text fields; this is out of scope for the current implementation. When reviewing or modifying this file, ensure changes do not introduce multi-text-field handling and, if relevant, update tests and documentation to reflect the single-text-field limitation.
Applied to files:
examples/multimodal/components/processor.py
📚 Learning: 2026-01-03T01:27:08.763Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-215
Timestamp: 2026-01-03T01:27:08.763Z
Learning: In examples/multimodal/components/processor.py, ensure the Dynamo multimodal processor supports flexible content ordering in multimodal requests (audio, image, or video content before text), reflecting the OpenAI multimodal API specification where content items can appear in any order. Do not enforce a text-first ordering as mandatory. Implement and verify logic that checks for required content types regardless of position, and update tests to cover all permutations of content ordering.
Applied to files:
examples/multimodal/components/processor.py
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
examples/multimodal/components/processor.py
🧬 Code graph analysis (1)
examples/multimodal/components/processor.py (2)
lib/llm/src/preprocessor/prompt.rs (1)
messages(54-54)lib/llm/src/tokenizers.rs (1)
text(348-353)
🪛 Ruff (0.14.10)
examples/multimodal/components/processor.py
227-227: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
|
Can you include some test cases to cover this aspect? |
Added test in the dynamo part. |
…#5196) Signed-off-by: Shang Wang <shangw@nvidia.com>
Overview:
Properly concatenate all text components from a multi-modal request to build the user prompt.
Details:
This is a follow up to #5143. If there are multiple text components present in a single OpenAI chat request, we concatenate them to build the text prompt. If the request is a multi-turn conversation, the code add a newline between each turn.
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.