-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Config] Remove invalid LLM-only engine_args from diffusion stage configs #2622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,15 +46,9 @@ stage_args: | |
| engine_args: | ||
| model_stage: dit | ||
| max_num_seqs: 1 | ||
| gpu_memory_utilization: 0.45 | ||
| enforce_eager: true | ||
| trust_remote_code: true | ||
| engine_output_type: image | ||
| distributed_executor_backend: "mp" | ||
| enable_prefix_caching: false | ||
| max_num_batched_tokens: 32768 | ||
| tensor_parallel_size: 1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same reasoning as above. |
||
| load_format: dummy | ||
| omni_kv_config: | ||
| need_recv_cache: true | ||
| engine_input_source: [0] | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If future users add new special fields, how should we maintain this UT, considering these new fields might only apply to a specific model?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. Two cases:
I'll add a comment in the test to document this but before I do, I'd love to hear your thoughts on whether this approach works for you. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| """Ensure diffusion stage YAML configs only use valid OmniDiffusionConfig fields. | ||
|
|
||
| Regression test for https://github.com/vllm-project/vllm-omni/issues/2563 | ||
| """ | ||
|
|
||
| from dataclasses import fields | ||
| from pathlib import Path | ||
|
|
||
| import pytest | ||
| import yaml | ||
|
|
||
| pytestmark = [pytest.mark.core_model, pytest.mark.cpu] | ||
|
|
||
| try: | ||
| from vllm_omni.diffusion.data import OmniDiffusionConfig | ||
| except Exception: | ||
| OmniDiffusionConfig = None | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| OmniDiffusionConfig is None, | ||
| reason="OmniDiffusionConfig could not be imported (missing torch?)", | ||
| ) | ||
| def test_diffusion_stage_configs_only_contain_valid_fields(): | ||
| """Diffusion stage engine_args must only contain OmniDiffusionConfig fields. | ||
|
|
||
| Regression test for https://github.com/vllm-project/vllm-omni/issues/2563 | ||
| """ | ||
| # Scan both main configs and test configs | ||
| repo_root = Path(__file__).parent.parent | ||
| config_dirs = [ | ||
| repo_root / "vllm_omni" / "model_executor" / "stage_configs", | ||
| ] | ||
| # Also scan test directories recursively | ||
| test_dir = repo_root / "tests" | ||
|
|
||
| yaml_paths: list[Path] = [] | ||
| for config_dir in config_dirs: | ||
| yaml_paths.extend(sorted(config_dir.glob("*.yaml"))) | ||
| yaml_paths.extend(sorted(test_dir.rglob("*.yaml"))) | ||
|
|
||
| valid_fields = {f.name for f in fields(OmniDiffusionConfig)} | ||
| # model_stage is consumed by the stage init layer, not OmniDiffusionConfig | ||
| valid_fields.add("model_stage") | ||
| # model_arch is consumed by the stage init layer for diffusion model class resolution | ||
| valid_fields.add("model_arch") | ||
| # "quantization" is mapped to "quantization_config" by from_kwargs() backwards-compat | ||
| valid_fields.add("quantization") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: worth a short block comment above the allowlist explaining the maintenance policy — entries here are fields consumed outside |
||
|
|
||
| invalid_entries: list[tuple[str, set[str]]] = [] | ||
| for yaml_path in yaml_paths: | ||
| with open(yaml_path) as fh: | ||
| config = yaml.safe_load(fh) | ||
|
|
||
| stages = config.get("stage_args", config.get("stages", [])) | ||
| for stage in stages: | ||
| if stage.get("stage_type") != "diffusion": | ||
| continue | ||
| engine_args = stage.get("engine_args", {}) | ||
| invalid = set(engine_args.keys()) - valid_fields | ||
| if invalid: | ||
| invalid_entries.append((yaml_path.relative_to(repo_root), invalid)) | ||
|
|
||
| assert not invalid_entries, "Diffusion stage configs contain fields not in OmniDiffusionConfig:\n" + "\n".join( | ||
| f" {name}: {sorted(bad)}" for name, bad in invalid_entries | ||
| ) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,10 +10,8 @@ stage_args: | |
| engine_args: | ||
| model_stage: dit | ||
| model_class_name: "OmniVoicePipeline" | ||
| gpu_memory_utilization: 0.5 | ||
| enforce_eager: true | ||
| trust_remote_code: true | ||
| engine_output_type: audio | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should maintain it, correct me if I am wrong
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are safe to remove this is a |
||
| distributed_executor_backend: "mp" | ||
| dtype: "float32" | ||
| final_output: true | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we remove
tensor_parallel_size? Seems someone just edittensor_parallel_sizeplace, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation! Yes
tensor_parallel_sizewas effectively "moved" toparallel_config.tensor_parallel_sizewhenDiffusionParallelConfigwas introduced in PR #189. Since then,OmniDiffusionConfigno longer has a top-leveltensor_parallel_sizefield, sofrom_kwargs()silently drops it (data.py L692-694). The value here is also 1, which matches theDiffusionParallelConfigdefault. Issue #2635 also tracks this inconsistency.