Skip to content

[Feature]: Add CFG param to online serving#824

Merged
david6666666 merged 3 commits into
vllm-project:mainfrom
gDINESH13:777_add_cfg_param
Jan 19, 2026
Merged

[Feature]: Add CFG param to online serving#824
david6666666 merged 3 commits into
vllm-project:mainfrom
gDINESH13:777_add_cfg_param

Conversation

@gDINESH13
Copy link
Copy Markdown
Contributor

closes #777

Purpose

Make cfg_parallel_size available in Offline Inference in Diffusion Models

Test Plan

My device doesn't have enough Hardware to conduct model inferencing tests. But I have verified if
parameter plumbing are working as expected by executing, the script below

# test_cfg_parallel_local.py
from vllm_omni.entrypoints.openai.protocol.images import ImageGenerationRequest
from vllm_omni.diffusion.data import DiffusionParallelConfig

print("Test 1: API Schema")
req = ImageGenerationRequest(prompt="test", cfg_parallel_size=2)
assert req.cfg_parallel_size == 2
print("PASSED")

print("\nTest 2: Configuration Flow")
config = DiffusionParallelConfig(ulysses_degree=1, ring_degree=1, cfg_parallel_size=2)
assert config.cfg_parallel_size == 2
assert config.world_size == 2
print("PASSED")

print("\nTest 3: Backward Compatibility")
req_default = ImageGenerationRequest(prompt="test")
assert req_default.cfg_parallel_size is None
print("PASSED")

Test Result

Result of test script execution.

Test 1: API Schema
PASSED

Test 2: Configuration Flow
PASSED

Test 3: Backward Compatibility
PASSED

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Dinesh G <G.Dinesh@ibm.com>
Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

  • please update the text-to-image.py under examples
  • please add it into api tests if necessary

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9ddc72391

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +135 to +138
cfg_parallel_size = kwargs.get("cfg_parallel_size") or 1
if sequence_parallel_size is None:
sequence_parallel_size = ulysses_degree * ring_degree
num_devices = sequence_parallel_size * tensor_parallel_size
num_devices = sequence_parallel_size * tensor_parallel_size * cfg_parallel_size
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for CFG parallel GPUs in stage device locking

The new cfg_parallel_size factor increases the diffusion stage’s device list (num_devices now multiplies by cfg_parallel_size), but the stage worker’s lock calculation still only uses TP/PP/DP/SP (vllm_omni/entrypoints/omni_stage.py lines 470–499). When cfg_parallel_size > 1 and multiple stages/processes initialize concurrently, the extra CFG GPUs won’t be locked, so another stage can initialize on them at the same time, defeating the “lock ALL devices” guarantee and risking memory-calculation/OOM races. Consider including cfg_parallel_size in num_devices_per_stage (or otherwise locking all CUDA_VISIBLE_DEVICES) to keep the lock coverage consistent with the new device list.

Useful? React with 👍 / 👎.

@gDINESH13
Copy link
Copy Markdown
Contributor Author

Hello @hsliuustc0106 good day, Thank you for your comments.
I have some questions,

  1. text-to-image.py: The file already has cfg_parallel_size parameter (lines 84-90, 147, 169).
    • Should I improve the help text documentation?
    • Should I remove the choices=[1, 2] restriction?
    • Is there something else you'd like updated?
  2. API tests: Should I add:
    • Integration tests for the /v1/images/generations endpoint with cfg_parallel_size? like the ones already in text_image_server.py
  3. The suggestion by bot, I missed it In my early commit, but I feel its right to do it. Can you share your comment about it?

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Hello @hsliuustc0106 good day, Thank you for your comments. I have some questions,

  1. text-to-image.py: The file already has cfg_parallel_size parameter (lines 84-90, 147, 169).

    • Should I improve the help text documentation?
    • Should I remove the choices=[1, 2] restriction?
    • Is there something else you'd like updated?
  2. API tests: Should I add:

    • Integration tests for the /v1/images/generations endpoint with cfg_parallel_size? like the ones already in text_image_server.py
  3. The suggestion by bot, I missed it In my early commit, but I feel its right to do it. Can you share your comment about it?

for online serving, I think we have not supported it before in the examples
for test, using test_image_server.py is ok

Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
@gDINESH13
Copy link
Copy Markdown
Contributor Author

Hey @hsliuustc0106 I'd updated the examples in online serving. Please take a look when you get a chance. Thanks..

type=int,
default=1,
help="Number of GPUs for CFG parallel computation"
"--cfg-parallel-size", type=int, default=1, help="Number of GPUs for CFG parallel computation"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it as before

Copy link
Copy Markdown
Contributor Author

@gDINESH13 gDINESH13 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to remove this? that would make this param unavailable to be configured right? while starting the server.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i mean keep line break as before

Copy link
Copy Markdown
Contributor Author

@gDINESH13 gDINESH13 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it fails pre-commit formatting check if I keep line break

le=20.0,
description="True CFG scale (model-specific parameter, may be ignored if not supported)",
)
cfg_parallel_size: int | None = Field(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it as before

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this

bash run_server.sh
```

### Start with CFG Parallelism
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wtomin can CFG be applied to all models now? I think we need to wait for the refactoring, right? if so, maybe we need to keep the examples as it is before the refactoring finished

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@gDINESH13 I realized that the cfg parallel is not applicable to all models at this stage, many we should keep the examples as before at this stage

@gDINESH13
Copy link
Copy Markdown
Contributor Author

@gDINESH13 I realized that the cfg parallel is not applicable to all models at this stage, many we should keep the examples as before at this stage

Just to confirm the scope—since we’re keeping the existing example as-is, please Let me know if there's anything else you want included or excluded.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@gDINESH13 I realized that the cfg parallel is not applicable to all models at this stage, many we should keep the examples as before at this stage

Just to confirm the scope—since we’re keeping the existing example as-is, please Let me know if there's anything else you want included or excluded.

yes

Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
@gDINESH13
Copy link
Copy Markdown
Contributor Author

@hsliuustc0106 I have removed changes I made in example files. I hope now we are on same page.

@david6666666
Copy link
Copy Markdown
Collaborator

lgtm

@david6666666 david6666666 added the ready label to trigger buildkite CI label Jan 19, 2026
@david6666666 david6666666 enabled auto-merge (squash) January 19, 2026 03:56
@david6666666 david6666666 merged commit 5e7035e into vllm-project:main Jan 19, 2026
6 of 7 checks passed
@david6666666
Copy link
Copy Markdown
Collaborator

Thanks for your contribution.

with1015 pushed a commit to with1015/vllm-omni that referenced this pull request Jan 20, 2026
Signed-off-by: Dinesh G <G.Dinesh@ibm.com>
Signed-off-by: gDINESH13 <dinesh13g@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Online serving support cfg args

3 participants