Skip to content

fix: Properly handle multiple text components from request#5196

Merged
jh-nv merged 8 commits intomainfrom
jihao/audio_prompt_DIS-1224
Jan 14, 2026
Merged

fix: Properly handle multiple text components from request#5196
jh-nv merged 8 commits intomainfrom
jihao/audio_prompt_DIS-1224

Conversation

@jh-nv
Copy link
Contributor

@jh-nv jh-nv commented Jan 5, 2026

Overview:

Properly concatenate all text components from a multi-modal request to build the user prompt.

Details:

This is a follow up to #5143. If there are multiple text components present in a single OpenAI chat request, we concatenate them to build the text prompt. If the request is a multi-turn conversation, the code add a newline between each turn.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes
    • Improved multi-modal message processing to correctly handle multi-turn conversations with multiple text segments, ensuring all user input is properly captured and concatenated rather than processing only individual text items.

✏️ Tip: You can customize this high-level summary in your review settings.

@jh-nv jh-nv requested review from a team as code owners January 5, 2026 23:47
@github-actions github-actions bot added the fix label Jan 5, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

Walkthrough

Modified user prompt extraction logic in processor.py to collect text segments from all user-role messages, concatenate text within each message, and join user turns with newlines. Updated error condition to trigger when no user messages contain text.

Changes

Cohort / File(s)
Multi-turn prompt extraction refactor
examples/multimodal/components/processor.py
Reworked user prompt extraction to handle multiple user-role messages by collecting and concatenating all text segments, joining user turns with newlines, and updating error handling for cases where no user messages contain text.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Through multi-turn conversations we hop,
Gathering text from each message stop,
Stitching together with newlines so neat,
User voices joined in harmonious beat! 🌟

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: handling multiple text components from requests, which aligns with the code rework to concatenate all text segments from multi-modal messages.
Description check ✅ Passed The description includes Overview, Details, and Related Issues sections as per the template. However, the 'Where should the reviewer start?' section is empty, which is a minor gap but not critical.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @examples/multimodal/components/processor.py:
- Around line 207-230: The current extraction appends empty strings from
message.content into user_texts so user_text may become empty or only newlines;
update the collection in the loop over raw_request.messages/message.content so
you only append item.text if item.type == "text" and item.text.strip() is
non-empty (i.e., skip whitespace-only strings), ensure you only append joined
text_parts to user_texts when text_parts contains at least one non-empty part,
and after building user_text (from "\n".join(user_texts)) validate that
user_text.strip() is non-empty and raise the existing ValueError if not;
reference symbols: raw_request.messages, message.content, item.type, item.text,
user_texts, user_text.
🧹 Nitpick comments (1)
examples/multimodal/components/processor.py (1)

227-227: Optional: Consider extracting error message to a constant.

The static analysis tool suggests avoiding long messages directly in exception raises. While this is a minor style issue, you could optionally extract it to improve maintainability:

🔎 Proposed refactor

Near the top of the class or module:

_ERR_NO_USER_TEXT = "No text content found in user messages"

Then at line 227:

-        raise ValueError("No text content found in user messages")
+        raise ValueError(_ERR_NO_USER_TEXT)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e791871 and 64f8182.

📒 Files selected for processing (1)
  • examples/multimodal/components/processor.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
📚 Learning: 2026-01-03T01:29:46.547Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-214
Timestamp: 2026-01-03T01:29:46.547Z
Learning: In examples/multimodal/components/processor.py, the processor currently supports only a single text field per request. Do not add support for multiple text fields; this is out of scope for the current implementation. When reviewing or modifying this file, ensure changes do not introduce multi-text-field handling and, if relevant, update tests and documentation to reflect the single-text-field limitation.

Applied to files:

  • examples/multimodal/components/processor.py
📚 Learning: 2026-01-03T01:27:08.763Z
Learnt from: tanmayv25
Repo: ai-dynamo/dynamo PR: 5143
File: examples/multimodal/components/processor.py:207-215
Timestamp: 2026-01-03T01:27:08.763Z
Learning: In examples/multimodal/components/processor.py, ensure the Dynamo multimodal processor supports flexible content ordering in multimodal requests (audio, image, or video content before text), reflecting the OpenAI multimodal API specification where content items can appear in any order. Do not enforce a text-first ordering as mandatory. Implement and verify logic that checks for required content types regardless of position, and update tests to cover all permutations of content ordering.

Applied to files:

  • examples/multimodal/components/processor.py
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • examples/multimodal/components/processor.py
🧬 Code graph analysis (1)
examples/multimodal/components/processor.py (2)
lib/llm/src/preprocessor/prompt.rs (1)
  • messages (54-54)
lib/llm/src/tokenizers.rs (1)
  • text (348-353)
🪛 Ruff (0.14.10)
examples/multimodal/components/processor.py

227-227: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo

@tanmayv25
Copy link
Contributor

tanmayv25 commented Jan 6, 2026

Can you include some test cases to cover this aspect?

@jh-nv
Copy link
Contributor Author

jh-nv commented Jan 10, 2026

Can you include some test cases to cover this aspect?

Added test in the dynamo part.

@jh-nv jh-nv requested a review from ayushag-nv January 10, 2026 04:24
@jh-nv jh-nv merged commit e22bb03 into main Jan 14, 2026
30 of 32 checks passed
@jh-nv jh-nv deleted the jihao/audio_prompt_DIS-1224 branch January 14, 2026 16:13
wangshangsam pushed a commit to CentML/dynamo that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants