You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.
Imports (line 86-87): Kept both StagePool and PDDisaggregationMixin imports
Orchestrator constructor (line 1031-1038): Passes pd_config=pd_config (from HEAD's PD disaggregation feature) but drops stage_clients, output_processors,
stage_vllm_configs since those are now accessed through stage_pools (from zwg's StagePool architecture)
Dataclass fix (line 109-140): Moved chosen_replica: dict[int, StageReplica] from StreamingInputState to OrchestratorRequestState -- it's a per-request field, and
StagePool.select_replica() accesses it as req_state.chosen_replica, not req_state.streaming.chosen_replica
_route_output (line 464-481): Uses stage_replica (StagePool) while keeping HEAD's streaming session logic (two calls for non-final + final update)
_forward_to_next_stage body (line 707-712): Uses StagePool-based next_pool.select_replica() and keeps next_stage_resumable for streaming
PD disaggregation block (line 750-796): Preserved full PD prefill-decode routing logic, converted all self.stage_clients[i] / self.stage_vllm_configs[i] /
self.output_processors[i] references to use stage_replica.client / next_replica.vllm_config / next_replica.output_processor
process_engine_inputs call (line 801-806): Passes both streaming_context and source_client
build_engine_core_request_from_tokens (line 826-828): Uses next_replica.vllm_config.model_config (StagePool) while keeping mm_features and resumable args (HEAD)
_handle_add_request (line 924-934): Sets req_state.streaming.enabled = is_streaming (HEAD) and uses stage_pools[stage_id] with select_replica() (zwg)
Non-conflict fix (line 820): Changed next_client to next_replica.client -- next_client was only defined in HEAD's code path and would be a NameError after resolution
Resolution: Merged both parameters: streaming_context: Any | None = None, source_client: Any | None = None. The method body already uses both -- streaming_context for
the custom processor streaming path, source_client for multi-replica upstream client selection.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)