-
Notifications
You must be signed in to change notification settings - Fork 793
[Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring #1908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
141 commits
Select commit
Hold shift + click to select a range
389979e
add async
yinpeiqi b9bc8e3
init runnable async omni
yinpeiqi fa1099b
temp
yinpeiqi 2140b90
update async omni
yinpeiqi 65cafdc
refactor init
yinpeiqi 1078d57
add next stage without input processor
yinpeiqi ad1b313
move input processor to engine
yinpeiqi 378f631
decouple input processor
yinpeiqi 3fd53a7
refactor output processor
yinpeiqi 97bc157
remove omni input processor
yinpeiqi b25a26f
use orchestrator
yinpeiqi 01319dd
update
yinpeiqi d76f9b5
add metrics
yinpeiqi 6093ea5
add download
yinpeiqi fc61104
add support for diffusion model
yinpeiqi a20f291
add doc
yinpeiqi 0173b6c
update e2e
yinpeiqi 5445e58
add precommit
yinpeiqi 6c689f8
fix main
yinpeiqi 421a46b
[draft] add basic support for bagel
yinpeiqi f82c940
add async chunk
yinpeiqi d8d1072
add qwen3 example
yinpeiqi 8290c5d
update test
yinpeiqi d6e66c6
move init to engine
yinpeiqi da80711
rename files
yinpeiqi 5f5d6b1
rename output handler
yinpeiqi 9f3e156
add doc
yinpeiqi 63f3b02
cleanup
yinpeiqi 0b65a8b
add test case
yinpeiqi 1659c7b
update doc
yinpeiqi 6992042
add omni base and omni
yinpeiqi e31cc45
use janus queue
yinpeiqi 67b392b
update doc
yinpeiqi 1fc31b3
update
yinpeiqi 4cd2d0e
update download
yinpeiqi f1d0ef2
update import
yinpeiqi d39053d
update test
yinpeiqi 8a39b6c
update test
yinpeiqi 0d349d2
update openai api
yinpeiqi 1cfd827
fix
yinpeiqi 4d84ba2
update e2e
yinpeiqi 803cef2
rebase, update shotdown
yinpeiqi c5953f9
add pre-commit
yinpeiqi 0be2032
update serve cli
yinpeiqi 9476536
add parallel init
yinpeiqi e6d69b8
update rebase
yinpeiqi 923c575
update
yinpeiqi 328f033
rebase
yinpeiqi a68c7ca
rebase
yinpeiqi 20395a3
update
yinpeiqi 03951b3
update setup
yinpeiqi 9e89e52
update and fix
yinpeiqi 6d945b8
refactor
yinpeiqi 0a9beae
rm v1 files
yinpeiqi 4ae402e
update config
yinpeiqi aa33e8e
update config
yinpeiqi 697187a
remove v0
yinpeiqi 4dcc1f1
rm v1
yinpeiqi 066eb03
delete input processor
yinpeiqi bae0b83
use weak ref
yinpeiqi 6cbe213
update
yinpeiqi 6d09583
remove deperated
yinpeiqi 0d74355
stage cli (#4)
wuhang2014 3e92219
update
yinpeiqi c2ee926
add get supported tasks
yinpeiqi 349344e
fix ci
yinpeiqi 02a57b3
update get config
yinpeiqi 3d678fa
update doc
yinpeiqi 6e5ee42
update tts yaml
yinpeiqi 673a6df
fix pre commit
yinpeiqi cef860f
update config
yinpeiqi f515204
fix
yinpeiqi 5e70d3f
fix ci
yinpeiqi 8d01da3
fix
yinpeiqi cfd1d1d
resolve config (#7)
wuhang2014 7b5fb26
fix for qwen3 tts
yinpeiqi c00cb79
fix for diffusion
yinpeiqi 0ced660
fix stage id is none
yinpeiqi 1616281
fix
yinpeiqi 463a96c
fix
yinpeiqi 941205f
rm logs
yinpeiqi 7abead8
update
yinpeiqi cce6a56
fix
yinpeiqi ad20d43
rm request output list example
yinpeiqi 20f72c2
fix pre commit
yinpeiqi 8b7d483
change timtout time
yinpeiqi 42c2efe
add factory usage
yinpeiqi db05cbb
fix
yinpeiqi eaa254a
update config
yinpeiqi 7f2d1f5
Merge branch 'main' into refactor
fake0fan efb2e85
fix pre commit
yinpeiqi 8dc3b18
Merge branch 'vllm-project:main' into refactor3
yinpeiqi 8ee93ae
Merge pull request #14 from yinpeiqi/refactor3
yinpeiqi 256cfbc
add comfyui
yinpeiqi bdb6e30
fix comfyui
yinpeiqi ca788e0
Merge pull request #15 from yinpeiqi/refactor3
yinpeiqi e9190c5
update stage
yinpeiqi 58fbf74
fix time sleep
yinpeiqi 36099da
Fix Qwen3-TTS broken on refactor: add pipeline.yaml and fix async_chu…
linyueqian 027a68f
Fix Base voice clone: use actual codec encoder for exact ref_code_len
linyueqian 18f632e
Merge pull request #1 from fake0fan/refactor
yinpeiqi caab74c
Merge branch 'refactor3' of https://github.com/yinpeiqi/vllm-omni int…
yinpeiqi 6caeea1
add docs for current arch
yinpeiqi 1b06e11
fix description
yinpeiqi 10657a1
Merge pull request #16 from yinpeiqi/refactor3
yinpeiqi ca88ec0
rm deparated funcs
yinpeiqi e03fced
rm deparated class
yinpeiqi 2de3bed
Merge branch 'main' into refactor
yinpeiqi f5492fe
Merge pull request #17 from yinpeiqi/refactor3
yinpeiqi 07f8bfa
mv worker cls utils
yinpeiqi 9d7b905
Fix perf config: add is_comprehension to qwen3_tts stage 0
linyueqian 41414d4
Support auto-detection for TTS perf benchmark (optional stage_config_…
linyueqian 52aba8f
Merge pull request #2 from fake0fan/refactor
yinpeiqi ff94a97
change stage init to stage init utils
yinpeiqi df17139
Set gpu_memory_utilization to 0.08 for Qwen3-TTS (1.7B model)
linyueqian 23ddbca
Merge pull request #18 from yinpeiqi/refactor3
yinpeiqi 264dead
refactor
yinpeiqi 4d7dc9e
add kv transfer inject and cfg expand
princepride 1b1acf2
rename stage_init.py -> stage_init_utils.py and align comments with r…
princepride 9d63d40
Merge fake0fan/refactor into fix-bagel-bugs
princepride a432d99
Merge pull request #19 from princepride/fix-bagel-bugs
fake0fan db33f8d
fix some bug
princepride cf99223
remove mutli image output
princepride 2000d6f
fix: use legacy config loading path instead of StageConfigFactory
lishunyang12 45e8381
Merge pull request #20 from lishunyang12/fix/use-legacy-config-path
fake0fan c5e22f6
fix: increase gpu_memory_utilization for TTS CI on L4 GPUs
lishunyang12 9a46667
Merge pull request #22 from princepride/fix-bagel-bugs-2
fake0fan 0faad47
Merge pull request #23 from lishunyang12/fix/tts-ci-gpu-memory
fake0fan febe9c8
fix pre-commit and glm-image
fake0fan 4282c09
Merge branch 'main' into refactor
fake0fan 6e37c1a
Merge branch 'refactor3' into refactor
yinpeiqi 6687859
Merge pull request #3 from fake0fan/refactor
yinpeiqi 4c32e7a
fix precommit, fix error
yinpeiqi d50a1b7
add utils for helper function
yinpeiqi e079342
Merge pull request #25 from yinpeiqi/refactor3
yinpeiqi 5dc6422
fix import
yinpeiqi fc55262
Merge pull request #26 from yinpeiqi/refactor3
yinpeiqi a8ea9da
Merge branch 'main' into refactor
yinpeiqi 0309f54
fix is alive, avoid duplicate check
yinpeiqi fe24400
Merge pull request #27 from yinpeiqi/refactor3
yinpeiqi 239a3f8
Merge branch 'main' into refactor
yinpeiqi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -330,54 +330,46 @@ Stage transitions are the mechanism by which outputs from one stage are converte | |
|
|
||
| ### Where Stage Transitions Are Called | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should also change the corresponding diffusion models docs |
||
|
|
||
| Stage transitions happen automatically in the orchestrator (`OmniLLM` class) during the generation loop. Here's the detailed flow: | ||
| Stage transitions happen automatically in the runtime orchestrator. Here's the detailed flow: | ||
|
|
||
| 1. **Location**: `vllm_omni/entrypoints/omni_llm.py` in the `_run_generation()` method | ||
| 1. **Location**: `vllm_omni/engine/orchestrator.py` in `_forward_to_next_stage()` | ||
| 2. **Trigger**: When a stage completes processing and produces outputs | ||
| 3. **Execution Flow**: | ||
| ```python | ||
| # In omni_llm.py, _run_generation() method (around line 345-460) | ||
|
|
||
| # Main orchestrator loop polls each stage for completed requests | ||
| for stage_id, stage in enumerate(self.stage_list): | ||
| result = stage.try_collect() # Get completed request | ||
| if result is None: | ||
| continue | ||
|
|
||
| # Store outputs from this stage | ||
| engine_outputs = _load(result, obj_key="engine_outputs", shm_key="engine_outputs_shm") | ||
| stage.set_engine_outputs(engine_outputs) | ||
|
|
||
| # Check if there's a next stage to forward to | ||
| next_stage_id = stage_id + 1 | ||
| if next_stage_id < len(self.stage_list): | ||
| next_stage: OmniStage = self.stage_list[next_stage_id] | ||
|
|
||
| # THIS IS WHERE STAGE TRANSITION HAPPENS | ||
| next_inputs = next_stage.process_engine_inputs( | ||
| self.stage_list, | ||
| [request_id_to_prompt[req_id]] | ||
| ) | ||
|
|
||
| # Submit to next stage | ||
| task = { | ||
| "type": OmniStageTaskType.GENERATE, | ||
| "request_id": req_id, | ||
| "engine_inputs": next_inputs[0], | ||
| "sampling_params": sampling_params_list[next_stage_id], | ||
| } | ||
| next_stage.submit(task) | ||
| # In orchestrator.py | ||
| next_stage_id = stage_id + 1 | ||
| next_client = self.stage_clients[next_stage_id] | ||
| params = req_state.sampling_params_list[next_stage_id] | ||
|
|
||
| # Save current stage outputs so stage_input_processors can consume them. | ||
| self.stage_clients[stage_id].set_engine_outputs([output]) | ||
|
|
||
| # THIS IS WHERE STAGE TRANSITION HAPPENS | ||
| next_inputs = next_client.process_engine_inputs( | ||
| stage_list=self.stage_clients, | ||
| prompt=req_state.prompt, | ||
| ) | ||
|
|
||
| # Build and submit request(s) to the next stage. | ||
| for next_input in next_inputs: | ||
| request = build_engine_core_request_from_tokens( | ||
| request_id=req_id, | ||
| prompt=next_input, | ||
| params=params, | ||
| model_config=self.stage_vllm_configs[next_stage_id].model_config, | ||
| ) | ||
| await next_client.add_request_async(request) | ||
| ``` | ||
|
|
||
| ### How Stage Transitions Work | ||
|
|
||
| The stage transition process follows these steps: | ||
|
|
||
| 1. **Stage Completion**: When a stage finishes processing a request, it stores outputs via `stage.set_engine_outputs(engine_outputs)` | ||
| 1. **Stage Completion**: When a stage finishes processing a request, the orchestrator stores outputs via `stage_client.set_engine_outputs(...)` | ||
|
|
||
| 2. **Transition Detection**: The orchestrator checks if there's a next stage and calls `process_engine_inputs()` on it | ||
|
|
||
| 3. **Input Processing**: The `process_engine_inputs()` method in `OmniStage` (`omni_stage.py`) handles the transition: | ||
| 3. **Input Processing**: The stage input processor configured in stage YAML (under `vllm_omni/model_executor/stage_input_processors/`) handles the transition: | ||
| ```python | ||
| def process_engine_inputs( | ||
| self, stage_list: list[Any], prompt: OmniTokensPrompt | TextPrompt = None | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any new tests for AsyncOmniEngine and Orchestrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add later