[Engine]Refactor output processing for multimodal capabilities in vLLM-omni by tzhouam · Pull Request #20 · vllm-project/vllm-omni

tzhouam · 2025-10-22T07:51:02Z

Purpose

This PR implements Phase 2 features of https://github.com/hsliuustc0106/vllm-omni/issues/10 . Refactor output processing for multimodal capabilities in vLLM-omni

Introduced OmniRequestState to manage multimodal request states.
Enhanced MultimodalOutputProcessor to handle various output types including images, text, and latents.
Implemented methods for accumulating multimodal tensors and processing outputs.
Updated output handling to ensure compatibility with vLLM's base processor while allowing custom modality handlers.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

- Introduced OmniRequestState to manage multimodal request states. - Enhanced MultimodalOutputProcessor to handle various output types including images, text, and latents. - Implemented methods for accumulating multimodal tensors and processing outputs. - Updated output handling to ensure compatibility with vLLM's base processor while allowing custom modality handlers.

Copilot

Pull Request Overview

This PR refactors the multimodal output processing system in vLLM-omni by replacing the original standalone processor with a vLLM-compatible architecture that extends vLLM's base OutputProcessor. The changes introduce better state management for multimodal requests and normalize different output types before delegating to vLLM's processing pipeline.

Key changes:

Introduced OmniRequestState to track multimodal tensor accumulation across request lifetime
Replaced the original MultimodalOutputProcessor with a vLLM-compatible version that extends VLLMOutputProcessor
Implemented modality-specific routing and normalization methods to handle images, text, latents, and audio outputs

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-22T08:25:03Z

+            return
+        try:
+            if mm_type:
+                self.mm_type = (mm_type or "").lower()


Redundant check: if mm_type is truthy, the or \"\" fallback in the .lower() call is unnecessary since mm_type is already confirmed to be non-empty.

Suggested change

self.mm_type = (mm_type or "").lower()

self.mm_type = mm_type.lower()

Copilot · 2025-10-22T08:25:03Z

+            except Exception:
+                pass
+            if self.mm_accumulated is None:
+                self.mm_accumulated = t
            else:
-                return "image"
-
-        # Check for latent-related attributes
-        if hasattr(output, 'latents') or hasattr(output, 'latent_representation'):
-            return "latents"
-
-        # Check for pooling output
-        if hasattr(output, 'pooler_output') and output.pooler_output is not None:
-            return "pooling"
-
-        # Default to text
-        return "text"
-
-    def _process_text_output(self, output: Any) -> List[RequestOutput]:
-        """Process text output."""
-        if isinstance(output, RequestOutput):
-            return [output]
-
-        # Create a mock RequestOutput for text
-        completion_output = CompletionOutput(
-            index=0,
-            text=getattr(output, 'text', ''),
-            token_ids=getattr(output, 'token_ids', []),
-            cumulative_logprob=getattr(output, 'cumulative_logprob', 0.0),
-            logprobs=getattr(output, 'logprobs', None),
-            finish_reason=getattr(output, 'finish_reason', 'length')
-        )
-
-        return [self._build_request_output(output, [completion_output])]
+                self.mm_accumulated = torch.cat([self.mm_accumulated, t], dim=0)
+        except Exception:
+            pass


Bare exception handler silently swallows all errors. Consider logging the exception or handling specific exception types to aid debugging.

Suggested change

except Exception:

pass

if self.mm_accumulated is None:

self.mm_accumulated = t

else:

return "image"

# Check for latent-related attributes

if hasattr(output, 'latents') or hasattr(output, 'latent_representation'):

return "latents"

# Check for pooling output

if hasattr(output, 'pooler_output') and output.pooler_output is not None:

return "pooling"

# Default to text

return "text"

def _process_text_output(self, output: Any) -> List[RequestOutput]:

"""Process text output."""

if isinstance(output, RequestOutput):

return [output]

# Create a mock RequestOutput for text

completion_output = CompletionOutput(

index=0,

text=getattr(output, 'text', ''),

token_ids=getattr(output, 'token_ids', []),

cumulative_logprob=getattr(output, 'cumulative_logprob', 0.0),

logprobs=getattr(output, 'logprobs', None),

finish_reason=getattr(output, 'finish_reason', 'length')

)

return [self._build_request_output(output, [completion_output])]

self.mm_accumulated = torch.cat([self.mm_accumulated, t], dim=0)

except Exception:

pass

except Exception as e:

logger.exception("Failed to move tensor to CPU in add_multimodal_tensor.")

if self.mm_accumulated is None:

self.mm_accumulated = t

else:

self.mm_accumulated = torch.cat([self.mm_accumulated, t], dim=0)

except Exception as e:

logger.exception("Exception occurred in add_multimodal_tensor.")

Copilot · 2025-10-22T08:25:04Z

+        except Exception:
+            pass


Bare exception handler silently swallows all errors. Consider logging the exception or handling specific exception types to aid debugging.

Suggested change

except Exception:

pass

except Exception as e:

logger.exception("Error in add_multimodal_tensor")

Copilot · 2025-10-22T08:25:04Z

+        if num_cached_tokens is not None:
+            # Keep num_cached_tokens in RequestOutput for compatibility
+            try:
+                self.num_cached_tokens = num_cached_tokens  # type: ignore[attr-defined]


The type: ignore[attr-defined] comment suggests this attribute is being set dynamically on a class that doesn't define it. Consider documenting why this dynamic attribute is necessary or adding it to the class definition.

Copilot · 2025-10-22T08:25:04Z

+                    setattr(base_output, "multimodal_output", {})
+                setattr(base_output, "multimodal_output", {self.mm_type: tensor})
+        except Exception as e:
+            logger.warning("Error in _new_completion_output", e)


The logger.warning() call is missing the exception argument. It should be logger.warning(\"Error in _new_completion_output: %s\", e) to properly format the exception message.

Suggested change

logger.warning("Error in _new_completion_output", e)

logger.warning("Error in _new_completion_output: %s", e)

Copilot · 2025-10-22T08:25:04Z

+            assert req_state.detokenizer is not None
+            assert req_state.logprobs_processor is not None
+            stop_string = req_state.detokenizer.update(
+                new_token_ids, finish_reason == FinishReason.STOP)
+            if stop_string:
+                finish_reason = FinishReason.STOP
+                stop_reason = stop_string
+            req_state.logprobs_processor.update_from_output(eco)


These assertions will fail for pooling-only requests where detokenizer and logprobs_processor are set to None (lines 60-61). The assertions should be conditional or removed to support non-text output modes.

Suggested change

assert req_state.detokenizer is not None

assert req_state.logprobs_processor is not None

stop_string = req_state.detokenizer.update(

new_token_ids, finish_reason == FinishReason.STOP)

if stop_string:

finish_reason = FinishReason.STOP

stop_reason = stop_string

req_state.logprobs_processor.update_from_output(eco)

if getattr(req_state, "output_kind", None) == RequestOutputKind.TEXT:

assert req_state.detokenizer is not None

assert req_state.logprobs_processor is not None

stop_string = req_state.detokenizer.update(

new_token_ids, finish_reason == FinishReason.STOP)

if stop_string:

finish_reason = FinishReason.STOP

stop_reason = stop_string

req_state.logprobs_processor.update_from_output(eco)

Copilot · 2025-10-22T08:25:05Z

+            except Exception:
+                pass


Bare exception handler silently swallows all errors. Consider logging the exception or handling specific exception types to aid debugging.

Suggested change

except Exception:

pass

except Exception as e:

logger.warning("Error accumulating multimodal tensor: %s", e)

Copilot · 2025-10-22T08:25:05Z

+                            setattr(ro, "multimodal_output", {})
+                        ro.multimodal_output[mm_key] = req_state.mm_accumulated
+                except Exception as e:
+                    logger.warning("Error in process_outputs", e)


The logger.warning() call is missing the exception argument. It should be logger.warning(\"Error in process_outputs: %s\", e) to properly format the exception message.

Suggested change

logger.warning("Error in process_outputs", e)

logger.warning("Error in process_outputs: %s", e)

Copilot · 2025-10-22T08:25:05Z

+            except Exception:
+                pass


Bare exception handler silently swallows all errors. Consider logging the exception or handling specific exception types to aid debugging.

Suggested change

except Exception:

pass

except Exception as e:

logger.exception(f"Exception in output handler for type '{output_type}': {e}")

Copilot · 2025-10-22T08:25:05Z

+            except Exception:
+                pass


Bare exception handler silently swallows all errors. Consider logging the exception or handling specific exception types to aid debugging.

Suggested change

except Exception:

pass

except Exception as e:

logger.warning(

"Failed to convert pooling_output to tensor: %r. Exception: %s",

eco.pooling_output, e)

hsliuustc0106

does this file work for different output modality including text, image, wav and etc.

tzhouam · 2025-10-23T06:55:07Z

does this file work for different output modality including text, image, wav and etc.

Yes, we have left the interface for the text, image, image+text, audio, hidden states, etc. But the actual implementation and postprocessing need further discussion depending on the user interface design.

- Added logging for exceptions during tensor movement to CPU in OmniRequestState and MultimodalOutputProcessor. - Improved robustness by ensuring the output pipeline continues without crashing on errors. - Updated comments for clarity on error handling behavior.

hsliuustc0106 · 2025-10-24T03:25:18Z

lgtm
approve

…t-processor [Engine]Refactor output processing for multimodal capabilities in vLLM-omni

fix: use legacy config loading path instead of StageConfigFactory

tzhouam changed the title ~~Refactor output processing for multimodal capabilities in vLLM-omni~~ [Engine]Refactor output processing for multimodal capabilities in vLLM-omni Oct 22, 2025

tzhouam marked this pull request as ready for review October 22, 2025 08:22

tzhouam requested review from Gaohan123, congw729, Copilot, fake0fan and hsliuustc0106 October 22, 2025 08:23

Copilot AI reviewed Oct 22, 2025

View reviewed changes

hsliuustc0106 reviewed Oct 22, 2025

View reviewed changes

hsliuustc0106 linked an issue Oct 23, 2025 that may be closed by this pull request

[RFC]: Roadmap to support the qwen-omni model in vllm-omni #10

Closed

15 tasks

tzhouam removed a link to an issue Oct 23, 2025

[RFC]: Roadmap to support the qwen-omni model in vllm-omni #10

Closed

15 tasks

Copilot AI mentioned this pull request Oct 24, 2025

Add comprehensive AI expert review framework and complete PR #20 review with update #29

Closed

hsliuustc0106 merged commit 4a4c3c1 into vllm-project:main Oct 24, 2025

tzhouam deleted the feat/multimodal-output-processor branch November 30, 2025 04:57

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

Merge pull request vllm-project#20 from tzhouam/feat/multimodal-outpu…

f6fe0b8

…t-processor [Engine]Refactor output processing for multimodal capabilities in vLLM-omni

lishunyang12 referenced this pull request in lishunyang12/vllm-omni Mar 17, 2026

Merge pull request #20 from lishunyang12/fix/use-legacy-config-path

722e84e

fix: use legacy config loading path instead of StageConfigFactory

lishunyang12 referenced this pull request in lishunyang12/vllm-omni Mar 17, 2026

Merge pull request #20 from lishunyang12/fix/use-legacy-config-path

45e8381

fix: use legacy config loading path instead of StageConfigFactory

This was referenced May 11, 2026

infra(runpod): Pod A image — MiniCPM-o-ko serving + ko phase16e weights mount (A40 single-GPU) #3515

Closed

infra(runpod): Pod A image — MiniCPM-o-ko serving + ko phase16e weights mount (A40 single-GPU) nextain/vllm-omni-old-works#12

Open

	self.mm_type = (mm_type or "").lower()
	self.mm_type = mm_type.lower()

	logger.warning("Error in _new_completion_output", e)
	logger.warning("Error in _new_completion_output: %s", e)

	logger.warning("Error in process_outputs", e)
	logger.warning("Error in process_outputs: %s", e)

Conversation

tzhouam commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

tzhouam commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tzhouam commented Oct 22, 2025 •

edited

Loading

tzhouam commented Oct 23, 2025 •

edited

Loading