[Config Refactor][2/N] Pipeline + Deploy Config Schema by lishunyang12 · Pull Request #2383 · vllm-project/vllm-omni

lishunyang12 · 2026-03-31T14:26:54Z

RFC: #2072

Motivation

Before this refactor, vLLM-Omni's multi-stage pipelines mixed topology (which stages exist, how they wire, what functions they call) and deployment parameters (TP size, memory budgets, device placement, connectors) into a single stage_configs/<model>.yaml per platform. Adding a new platform meant editing N files; changing a max_num_seqs meant forking the whole YAML; model developers and deployment engineers edited the same file with different concerns in mind.

This PR implements RFC #2072 — splitting the legacy YAML into two layers connected by a runtime merge with documented precedence.

Design

Before vs. after

flowchart LR
    subgraph Before["Before — single YAML per platform"]
        direction TB
        Y1["stage_configs/qwen3_omni_moe.yaml<br/>(topology + params + connectors)"]
        Y2["platforms/npu/stage_configs/qwen3_omni_moe.yaml<br/>(full copy for NPU)"]
        Y3["platforms/rocm/stage_configs/qwen3_omni_moe.yaml<br/>(full copy for ROCm)"]
    end
    subgraph After["After — split by concern"]
        direction TB
        P["models/qwen3_omni/pipeline.py<br/>(frozen topology — developer-owned)"]
        D["deploy/qwen3_omni_moe.yaml<br/>(CUDA defaults — deployer-owned)"]
        DN["deploy/qwen3_omni_moe.yaml<br/>:platforms.npu.stages<br/>(platform deltas inline)"]
    end
    Before --> After

Pipeline (models/<name>/pipeline.py) calls register_pipeline(PipelineConfig(...)) at import time. Frozen — deploy cannot reshape the graph.
Deploy (deploy/<model>.yaml) carries per-stage TP size, GPU memory, device placement, connectors, and platform deltas.
A single CUDA default yaml with platforms: { npu, rocm, xpu } sections replaces three parallel files.

Two-level config objects

Aspect	`PipelineConfig`	`DeployConfig`
Mutability	`frozen=True`	mutable (platform/CLI override)
Source	Python (`pipeline.py`)	YAML (`deploy/*.yaml`)
Owner	model developers	deployment engineers
Contains	stages, edges, processor funcs, sampling constraints	TP/mem/devices, connectors, platform deltas

Precedence chain

flowchart LR
    D1["Parser defaults"] -->|weakest| D2["Base deploy YAML"]
    D2 --> D3["Overlay YAML<br/>via base_config:"]
    D3 --> D4["Platform section<br/>platforms.npu.stages"]
    D4 --> D5["Global CLI<br/>--gpu-memory-utilization"]
    D5 -->|strongest| D6["Per-stage CLI<br/>--stage-overrides JSON"]

User-typed keys are tracked via _cli_explicit_keys (parser-aware: walks parser._actions so --disable-X → dest=enable_X and alias flags resolve correctly) so argparse defaults do not silently overwrite YAML values.

CLI flag routing — OrchestratorArgs

make_arg_parser flattens uvicorn / FastAPI / engine / orchestrator flags into a single namespace. The old code maintained two hardcoded frozensets (~49 strings total) as denylists — fragile. OrchestratorArgs replaces them with a dataclass; split_kwargs classifies each flag by dataclass membership; CI invariants in tests/test_arg_utils.py catch unclassified flags at test time.

Auto-discovery

No more hardcoded PIPELINE_MODELS / _ARCHITECTURE_MODELS dicts. _discover_all_pipelines scans model_executor/models/*/pipeline.py and registers them; a contributor adding a new model just drops a pipeline.py.

Summary of Changes

Area	Change
New modules	`vllm_omni/engine/arg_utils.py` (OrchestratorArgs + SHARED_FIELDS + split_kwargs), `vllm_omni/deploy/*.yaml` (3 default deploy configs + CI overlays)
New pipelines	`models/qwen2_5_omni/pipeline.py`, `qwen3_omni/pipeline.py`, `qwen3_tts/pipeline.py`
New CLI flags	`--deploy-config`, `--stage-overrides`, `--async-chunk` / `--no-async-chunk`
Removed hardcoded state	`INTERNAL_STAGE_OVERRIDE_KEYS`, `SERVER_ONLY_KEYS`, `PIPELINE_MODELS`, `_ARCHITECTURE_MODELS` — replaced by dataclass-derived invariants and auto-discovery
Refactorings	`merge_pipeline_deploy` split into 4 single-responsibility helpers (SLAP); `_apply_platform_overrides` deduplicated; execution_type → (stage_type, worker_type) lookup table; parser-aware `detect_explicit_cli_keys`
Schema clean-up	8 pipeline-wide engine settings (`trust_remote_code`, `distributed_executor_backend`, `dtype`, `quantization`, `enable_prefix_caching`, `enable_chunked_prefill`, `data_parallel_size`, `pipeline_parallel_size`) moved from per-stage to top-level `DeployConfig`
Validation	`merge_pipeline_deploy` raises if `async_chunk=True` but no stage declares an async handler; `get_scheduler_cls` raises on invalid `stage_id` / unmapped execution_type; `_deep_merge_stage` warns on type-mismatch clobber; `--stage-configs-path` and `--deploy-config` are mutex; scheduler map stores class refs (rename fails at import)
Legacy paths retained	`--stage-configs-path` (deprecated in help text, to be removed in 2c); `ModelPipeline` / `StageConfig` / `_parse_pipeline_yaml` preserved for not-yet-migrated models
Tests	`tests/test_arg_utils.py` (15 invariants incl. BVA), expanded `tests/test_config_factory.py` (+644 lines)
YAML style	Deploy YAMLs carry only non-default values; every field falls back to the `StageDeployConfig` dataclass default (single source of truth at `vllm_omni/config/stage_config.py`)
CLI entry	`Omni.from_cli_args(args, parser=parser)` / `AsyncOmni.from_cli_args(args, parser=parser)` mirror `OmniEngineArgs.from_cli_args`; optional `parser=` enables accurate `_cli_explicit_keys` resolution
Docs	`docs/configuration/stage_configs.md` rewritten with unified schema tables, connector schema, worked override precedence example; `examples/online_serving/qwen3_tts/README.md` gains a Sync vs async-chunk mode section

Test Plan

1. Unit tests + smoke scripts (CPU-only)

pytest tests/test_config_factory.py -v
pytest tests/test_arg_utils.py -v
python tools/smoke_config_loading.py        # 19 checks
python tools/smoke_cli_explicit_keys.py     # 13 checks

2. E2E launch matrices (GPU box)

qwen2_5_omni

BASE="vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091"

# Default — auto-loads vllm_omni/deploy/qwen2_5_omni.yaml
$BASE

# Global CLI flag (applies to every stage; explicit beats yaml)
$BASE --max-model-len 16384

# Per-stage override via JSON
$BASE --stage-overrides '{"1":{"gpu_memory_utilization":0.5},"2":{"max_num_batched_tokens":16384}}'

# Explicit precedence: per-stage beats global
$BASE --max-num-seqs 4 --stage-overrides '{"1":{"max_num_seqs":8}}'

# BooleanOptionalAction flags
$BASE --enable-prefix-caching
$BASE --no-enable-prefix-caching
$BASE --no-async-chunk

qwen3_omni_moe

BASE="vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091"

# Default — auto-loads vllm_omni/deploy/qwen3_omni_moe.yaml
$BASE
$BASE --stage-overrides '{"1":{"gpu_memory_utilization":0.5}}'

# Multi-node overlay (README has a worked example)
$BASE --deploy-config /path/to/my_qwen3_omni_multinode.yaml

qwen3_tts (async vs sync codec, both from one yaml)

BASE_TTS="vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --omni --port 8091"

# Async chunk ON (yaml default) — chunked streaming
$BASE_TTS

# Async chunk OFF — same pipeline, alternate processor function selected
# automatically by merge_pipeline_deploy based on deploy.async_chunk.
$BASE_TTS --no-async-chunk

# Batched throughput via per-stage overrides (replaces the deleted qwen3_tts_batch.yaml)
$BASE_TTS --stage-overrides '{"0":{"max_num_seqs":4,"gpu_memory_utilization":0.2},"1":{"max_num_seqs":4,"gpu_memory_utilization":0.2}}'

3. Server flag isolation (regression check for #873-class bugs)

vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 \
    --host 0.0.0.0 \
    --served-model-name my-omni \
    --api-key secret123 \
    --allowed-local-media-path /tmp/

Expected: clean startup. A TypeError: unexpected keyword argument 'host' from OmniEngineArgs.__init__ would indicate server flags leaking into the per-stage engine path.

Review feedback addressed

19 threads from @alex-jw-brooks resolved — correctness fixes (mutex validation for --stage-configs-path / --deploy-config, async_chunk handler check, get_scheduler_cls error paths, deep-merge clobber warning, parser-aware flag detection), cleanups (removed redundant qwen3_tts_no_async_chunk alias, dead get_stage_config wrappers in 4 test files), and doc clarifications (logical device IDs, engine_extras rationale). See #2887 for deferred follow-ups (hardware auto-sizing, model-instance-driven config values, override type validation, central pipeline registry).

What ships in follow-ups

2c: remove --stage-configs-path and legacy ModelPipeline / _parse_pipeline_yaml; migrate remaining legacy models (fish_speech, cosyvoice3, mimo_audio, voxtral_tts) to the registry; split stage_config.py (~1200 LOC) into focused modules once the legacy surface is gone.
Tooling PR: carve tools/smoke_*.py into its own PR.
[Follow-up] Deploy/pipeline config follow-ups from #2383 #2887 tracks the other review follow-ups.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 929007a841

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T14:33:31Z

+            if value is not None:
+                result[key] = value


Whitelist stage override keys before copying CLI kwargs

This loop forwards every non-None CLI kwarg that is not in a small denylist, so non-engine server flags can leak into per-stage engine_args. In the OpenAI server path, AsyncOmni is built from vars(args), which includes API/uvicorn options; once forwarded here, they eventually hit AsyncOmniEngineArgs(model, **engine_args) and can fail with unexpected-keyword errors for migrated models. Please filter by an allowlist of engine/runtime keys (or a parser-backed schema) instead of forwarding arbitrary kwargs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T14:33:32Z

+            ea["hf_config_name"] = self.hf_config_name
+        if self.engine_output_type:
+            ea["engine_output_type"] = self.engine_output_type
+        ea["async_chunk"] = self.async_chunk


Preserve CLI async_chunk override in resolved config

CLI overrides can set async_chunk in engine_args, but this assignment overwrites that value with self.async_chunk, which is currently sourced from deploy YAML (deploy.async_chunk) in _resolve. As a result, --async-chunk cannot actually override deploy defaults, breaking the documented precedence and preventing users from toggling async mode at runtime.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T14:33:32Z

+            "ray_address",
+            "batch_timeout",
+            "log_stats",
+            "tokenizer",


Forward tokenizer override for registered-model path

Marking tokenizer as an internal key causes it to be dropped from all stage overrides in the new factory path. The legacy YAML path explicitly injected tokenizer into stage engine args, so this is a behavior regression: users passing --tokenizer for migrated models (like qwen3_omni_moe) will silently run with the default tokenizer instead of the requested one.

Useful? React with 👍 / 👎.

lishunyang12 · 2026-04-06T09:35:06Z

@hsliuustc0106 @david6666666 @wuhang2014 My P0 priority will be on this pr this week.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e1c756f8b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T09:41:04Z

+        per_stage_only = {
+            k: v for k, v in cli_overrides.items()
+            if re.match(r"stage_\d+_", k) and v is not None


Preserve explicit global CLI overrides in registry path

Filtering cli_overrides down to only stage_<id>_* keys means explicit global engine flags are silently ignored for migrated models (for example --max-model-len, --max-num-seqs, or --gpu-memory-utilization). As a result, users can no longer tune all stages via normal CLI arguments and must rewrite everything into per-stage JSON overrides, which is a behavior regression versus the legacy path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T09:41:05Z

+        config_path = deploy_config_path
+        stage_configs = load_stage_configs_from_model(
+            model,
+            base_engine_args=kwargs,
+            deploy_config_path=deploy_config_path,


Parse deploy connector schema before using deploy path as config

This new branch forwards deploy YAML paths as config_path, but stage initialization still loads transfer connectors through load_omni_transfer_config_for_model, whose parser expects legacy runtime.connectors + stage_args structure. New deploy files use connectors + stages, so connector specs/extras are dropped; in async-chunk or distributed setups, custom connector backends and connector tuning in deploy configs will not take effect.

Useful? React with 👍 / 👎.

tools/e2e_serve_smoke.sh starts the qwen3_omni vllm-omni server with the new --deploy-config, waits for ready, optionally asserts a pattern in the server log (Layer 5 precedence verification), sends a chat completion request, verifies the response shape, and tears down cleanly. Single script covers Layer 4 (bare serve) and Layer 5 (precedence verification) via the E2E_LOG_GREP env var and forwarded extra args. Signed-off-by: lishunyang <lishunyang12@163.com>

…oject#2383 vllm-project#2383 replaced the per-model stage_configs/*.yaml layout with auto-loaded vllm_omni/deploy/<model>.yaml (Pipeline in Python, Deploy in YAML) and switched the DFX runner's config-loading dir from stage_configs/ to deploy/. This PR's test matrix and bench CLI still carried the old references: - test_tts.json: drop `stage_config_name` from the Qwen3-TTS entries; vllm-omni now auto-loads vllm_omni/deploy/qwen3_tts.yaml for both Base and CustomVoice checkpoints. - model_configs.yaml: drop the `stage_config` field — the bench CLI does not reference it and auto-discovery handles pipeline lookup. - bench_tts.py: remove the dead `--stage-configs-dir` flag and the `_DEFAULT_STAGE_CONFIGS_DIR` constant; both were unused and pointed at a directory vllm-project#2383 deleted. - Delete tests/dfx/perf/stage_configs/voxcpm2.yaml — the directory no longer exists post-vllm-project#2383. VoxCPM2 is not yet migrated to the Pipeline + Deploy schema in vllm-project#2383 (only qwen2_5_omni / qwen3_omni / qwen3_tts ship pipeline.py + deploy YAML) and still loads via the legacy `ModelPipeline` path. Drop the test_voxcpm2 entry from test_tts.json to unblock DFX nightly; will re-add as a follow-up once VoxCPM2 gets its deploy YAML. The latency / throughput / quality baselines remain unchanged — they come from H20 sweeps on stable checkpoints and should still hold under the new deploy YAML (stage 0 now sets max_num_seqs=10 and async_scheduling=true, which can only improve throughput numbers). Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

… in pipeline.py Moves pipeline declarations to vllm_omni/config/pipeline_registry.py (one dict per category, keyed by model_type -> (module, var)), mirroring vLLM's models/registry.py. _PIPELINE_REGISTRY is now a lazy proxy that imports the module on first lookup, so a missed registration is impossible to hide in a per-model pipeline.py. - New: vllm_omni/config/pipeline_registry.py (_OMNI_PIPELINES, _DIFFUSION_PIPELINES, union _VLLM_OMNI_PIPELINES) - stage_config: replace dict _PIPELINE_REGISTRY with _LazyPipelineRegistry; drop the now-unnecessary _discover_all_pipelines walk. - qwen2_5_omni / qwen3_omni / qwen3_tts pipeline.py: remove register_pipeline() self-calls; pipelines are declared centrally now. - register_pipeline() kept public for plugins/tests; dynamic registrations override the central entry. Addresses vllm-project#2887 item 4 and vllm-project#2383 (comment). Preparatory work for #3/N (17 single-stage diffusion models). Signed-off-by: lishunyang <lishunyang12@163.com>

…inheritance test with overlay build_stage_runtime_overrides: ``model``, ``stage_id``, ``log_stats`` and ``stage_configs_path`` are all in SHARED_FIELDS — they are set uniformly by the orchestrator, not per-stage. Previously internal_blacklist_keys() subtracted SHARED_FIELDS from orchestrator_field_names(), so these keys leaked into a stage's runtime_overrides dict (e.g. a user passing --model foo made every stage see {"model": "foo"} as a per-stage override). Fix: default internal_keys to `internal_blacklist_keys() | SHARED_FIELDS`. Fixes tests/test_config_factory.py ::test_cli_override_excludes_internal_keys, ::test_per_stage_override_excludes_internal_keys, ::test_build_stage_runtime_overrides_ignores_other_stage_and_internal_keys. test_ci_inherits_from_main: CI overlay (tests/utils._CI_OVERLAYS["qwen3_omni_moe"]) now explicitly sets async_chunk: False (added in vllm-project#2383 fix #53) to override the base yaml. Update the assertion to match current behaviour and document why. Signed-off-by: lishunyang <lishunyang12@163.com>

Address review comment on PR vllm-project#2835 — `benchmarks/tts/` shipped four scripts + a YAML registry with zero docs, leaving users to reverse-engineer the CLI from `--help` output. Add a single-page README covering: - quick-start recipes (smoke, concurrency sweep, WER/SIM/UTMOS) - plot_results.py usage - the three task types and which checkpoints support each (notes that -CustomVoice lacks speaker_encoder so voice_clone is Base-only) - model_configs.yaml extension recipe for new TTS models - dataset matrix (bundled seed_tts_design / seed_tts_smoke, external seed-tts-eval with link to the download guide) - DFX nightly integration: latency / throughput / quality regimes, median-vs-mean baseline choice, quality-entry gating rationale - observed H20 concurrency-cliff reference table (RFC vllm-project#272 sentinel) - file layout + cross-references to vllm-project#2558 and vllm-project#2383 Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

…2383) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: xiaohajiayou <75477391+xiaohajiayou@users.noreply.github.com> Co-authored-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

Migrate VoxCPM2, CosyVoice3, MiMo Audio, Voxtral TTS, and Fish Speech S2 Pro to the Pipeline + Deploy config schema introduced in vllm-project#2383. Each model now declares: * vllm_omni/model_executor/models/<model>/pipeline.py — frozen topology (model_type, stages, execution_type, input processors, sampling constraints). * vllm_omni/deploy/<model>.yaml — runtime tunables (max_num_seqs, gpu_memory_utilization, devices, sampling params). * pipeline_registry.py entry so the lazy loader resolves model_type → pipeline. Legacy vllm_omni/model_executor/stage_configs/<model>.yaml files are removed. Users can now launch with `vllm serve <model> --omni`; the deploy config auto-loads from vllm_omni/deploy/<model>.yaml. Async-chunk variants for CosyVoice3 and MiMo Audio live in separate deploy files (<model>_async_chunk.yaml) selected with --deploy-config. Notes: * MiMo Audio declares hf_architectures=("MiMoAudioForConditionalGeneration",) because MiMoAudioConfig inherits Qwen2Config and reports model_type="qwen2" — the factory falls back to architectures for disambiguation. * Fish Speech's registry key is "fish_qwen3_omni" matching the HF top-level model_type (FishSpeechConfig.model_type); the source directory stays as fish_speech for readability. * Voxtral TTS declares tokenizer_mode/config_format/load_format per-stage since they are not pipeline-wide DeployConfig fields yet. Doc/example sweep: * examples/online_serving/voxcpm2/{README.md,openai_speech_client.py, gradio_demo.py}: replace stale `python -m ...api_server` invocation with `vllm serve openbmb/VoxCPM2 --omni`. * examples/online_serving/{fish_speech,mimo_audio}/README.md, examples/online_serving/fish_speech/run_{server,gradio_demo}.sh: drop --stage-configs-path; auto-load applies. * examples/offline_inference/{mimo_audio,voxtral_tts,cosyvoice3}: rename --stage-configs-path CLI arg to --deploy-config (default None) and forward as deploy_config= kwarg to Omni/AsyncOmni. * docs/serving/speech_api.md and docs/user_guide/examples/**: same sweep for docs. Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

@lishunyang12

…r model Follow @lishunyang12's review feedback on vllm-project#2958: match the qwen3_omni_moe pattern from vllm-project#2383 where a single deploy yaml covers both sync and async-chunk modes. Users toggle via the ``--async-chunk`` / ``--no-async-chunk`` CLI flag (both already wired via ``argparse.BooleanOptionalAction`` in ``cli/serve.py``). Changes: * ``deploy/cosyvoice3.yaml`` now ships ``async_chunk: true`` with ``SharedMemoryConnector`` + ``output/input_connectors`` declared unconditionally. The ``sync_process_input_func`` (``text2flow``) declared on stage 1 in ``cosyvoice3/pipeline.py`` is picked up automatically when ``--no-async-chunk`` flips the mode. * ``deploy/mimo_audio.yaml`` now ships ``async_chunk: true`` with both stages on ``devices: "0"``. The legacy 2-GPU sync topology is reachable via ``--no-async-chunk --stage-1-devices 1 --stage-1-max-model-len 18192 --stage-1-max-num-batched-tokens 18192`` (see the header comment in the yaml). * Drop ``deploy/cosyvoice3_async_chunk.yaml`` and ``deploy/mimo_audio_async_chunk.yaml``. Test + doc updates: * ``tests/e2e/offline_inference/test_cosyvoice3.py`` parametrizes on an ``async_chunk: bool`` flag (instead of yaml path) and passes it through ``OmniRunner(async_chunk=...)``. Drops the obsolete ``_patched_stage_config`` that only applied to the legacy ``stage_args`` schema. * ``tests/e2e/online_serving/test_cosyvoice3_tts.py`` keeps both sync / async parameter blocks pointed at the same deploy yaml and distinguishes them with ``--no-async-chunk`` in ``server_args``. * ``tests/e2e/online_serving/test_mimo_audio.py`` points at ``mimo_audio.yaml`` (the consolidated one). * Offline cosyvoice3 README/docs: rephrase the "two yamls" note to "one yaml, toggle with ``--no-async-chunk``". Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

…2383) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: xiaohajiayou <75477391+xiaohajiayou@users.noreply.github.com> Co-authored-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

lishunyang12 requested a review from hsliuustc0106 as a code owner March 31, 2026 14:26

chatgpt-codex-connector Bot reviewed Mar 31, 2026

View reviewed changes

lishunyang12 force-pushed the config-refactor-2a branch from 929007a to 4c7e428 Compare March 31, 2026 14:39

lishunyang12 marked this pull request as draft March 31, 2026 14:41

lishunyang12 force-pushed the config-refactor-2a branch 6 times, most recently from d19c35b to 198e296 Compare March 31, 2026 15:18

lishunyang12 mentioned this pull request Apr 6, 2026

fix --gpu-memory-utilization CLI override #2516

Closed

lishunyang12 marked this pull request as ready for review April 6, 2026 09:34

chatgpt-codex-connector Bot reviewed Apr 6, 2026

View reviewed changes

lishunyang12 marked this pull request as draft April 6, 2026 13:01

lishunyang12 closed this Apr 6, 2026

lishunyang12 reopened this Apr 9, 2026

lishunyang12 changed the title ~~[Config Refactor][2/N] Pipeline + Deploy Config Schema (qwen3_omni)~~ [Config Refactor][2/N] Pipeline + Deploy Config Schema (qwen3_omni+hunyuan image) Apr 9, 2026

lishunyang12 mentioned this pull request Apr 9, 2026

[Bugfix] Fix CLI defaults overriding stage configs #2656

Closed

5 tasks