-
Notifications
You must be signed in to change notification settings - Fork 1k
[Feature]: Add CFG param to online serving #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -208,6 +208,9 @@ def subparser_init(self, subparsers: argparse._SubParsersAction) -> FlexibleArgu | |
| default=None, | ||
| help="Scheduler flow_shift for video models (e.g., 5.0 for 720p, 12.0 for 480p).", | ||
| ) | ||
| omni_config_group.add_argument( | ||
| "--cfg-parallel-size", type=int, default=1, help="Number of GPUs for CFG parallel computation" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. keep it as before
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want me to remove this? that would make this param unavailable to be configured right? while starting the server.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, i mean keep line break as before
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it fails pre-commit formatting check if I keep line break |
||
| ) | ||
| return serve_parser | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new
cfg_parallel_sizefactor increases the diffusion stage’s device list (num_devicesnow multiplies bycfg_parallel_size), but the stage worker’s lock calculation still only uses TP/PP/DP/SP (vllm_omni/entrypoints/omni_stage.pylines 470–499). Whencfg_parallel_size > 1and multiple stages/processes initialize concurrently, the extra CFG GPUs won’t be locked, so another stage can initialize on them at the same time, defeating the “lock ALL devices” guarantee and risking memory-calculation/OOM races. Consider includingcfg_parallel_sizeinnum_devices_per_stage(or otherwise locking all CUDA_VISIBLE_DEVICES) to keep the lock coverage consistent with the new device list.Useful? React with 👍 / 👎.