[Feature]: Add CFG param to online serving#824
Conversation
Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9ddc72391
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cfg_parallel_size = kwargs.get("cfg_parallel_size") or 1 | ||
| if sequence_parallel_size is None: | ||
| sequence_parallel_size = ulysses_degree * ring_degree | ||
| num_devices = sequence_parallel_size * tensor_parallel_size | ||
| num_devices = sequence_parallel_size * tensor_parallel_size * cfg_parallel_size |
There was a problem hiding this comment.
Account for CFG parallel GPUs in stage device locking
The new cfg_parallel_size factor increases the diffusion stage’s device list (num_devices now multiplies by cfg_parallel_size), but the stage worker’s lock calculation still only uses TP/PP/DP/SP (vllm_omni/entrypoints/omni_stage.py lines 470–499). When cfg_parallel_size > 1 and multiple stages/processes initialize concurrently, the extra CFG GPUs won’t be locked, so another stage can initialize on them at the same time, defeating the “lock ALL devices” guarantee and risking memory-calculation/OOM races. Consider including cfg_parallel_size in num_devices_per_stage (or otherwise locking all CUDA_VISIBLE_DEVICES) to keep the lock coverage consistent with the new device list.
Useful? React with 👍 / 👎.
|
Hello @hsliuustc0106 good day, Thank you for your comments.
|
for online serving, I think we have not supported it before in the examples |
Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
5785c42 to
d9ba570
Compare
|
Hey @hsliuustc0106 I'd updated the examples in online serving. Please take a look when you get a chance. Thanks.. |
| type=int, | ||
| default=1, | ||
| help="Number of GPUs for CFG parallel computation" | ||
| "--cfg-parallel-size", type=int, default=1, help="Number of GPUs for CFG parallel computation" |
There was a problem hiding this comment.
Do you want me to remove this? that would make this param unavailable to be configured right? while starting the server.
There was a problem hiding this comment.
no, i mean keep line break as before
There was a problem hiding this comment.
it fails pre-commit formatting check if I keep line break
| le=20.0, | ||
| description="True CFG scale (model-specific parameter, may be ignored if not supported)", | ||
| ) | ||
| cfg_parallel_size: int | None = Field( |
| bash run_server.sh | ||
| ``` | ||
|
|
||
| ### Start with CFG Parallelism |
There was a problem hiding this comment.
@wtomin can CFG be applied to all models now? I think we need to wait for the refactoring, right? if so, maybe we need to keep the examples as it is before the refactoring finished
|
@gDINESH13 I realized that the cfg parallel is not applicable to all models at this stage, many we should keep the examples as before at this stage |
Just to confirm the scope—since we’re keeping the existing example as-is, please Let me know if there's anything else you want included or excluded. |
yes |
Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
|
@hsliuustc0106 I have removed changes I made in example files. I hope now we are on same page. |
|
lgtm |
|
Thanks for your contribution. |
Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
closes #777
Purpose
Make cfg_parallel_size available in Offline Inference in Diffusion Models
Test Plan
My device doesn't have enough Hardware to conduct model inferencing tests. But I have verified if
parameter plumbing are working as expected by executing, the script below
Test Result
Result of test script execution.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)