diff --git a/docs/cli/serve.md b/docs/cli/serve.md
index 47a873b7211..035fa056731 100644
--- a/docs/cli/serve.md
+++ b/docs/cli/serve.md
@@ -1,5 +1,59 @@
 # vllm-omni serve
 
+## Stage-based CLI quickstart
+
+The stage-based CLI is designed for deployments that require launching each pipeline stage in an isolated process
+(e.g., across separate operating system processes, distinct GPUs, or distributed hosts).
+
+- For **migrated models** that utilize the bundled deployment YAML configurations located in
+  `vllm_omni/deploy/`, the `--deploy-config` flag is only required to override the default configuration. By default, executing `vllm serve MODEL --omni ...`
+  automatically loads the bundled deployment configuration.
+- For **legacy models** utilizing configuration files located in
+  `vllm_omni/model_executor/stage_configs/`, the `--stage-configs-path` parameter remains mandatory.
+
+Example: Initializing Stage 0 (Orchestrator and API Server):
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+Example: Initializing a Headless Worker Stage (Stage 1):
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+When utilizing a custom deployment YAML based on the new schema, append `--deploy-config /path/to/override.yaml` to each command execution. Conversely, for legacy models, substitute this parameter with `--stage-configs-path /path/to/stage_configs.yaml`.
+
+In the standard execution paradigm, the `--stage-overrides` argument is utilized to apply stage-specific configurations from a single CLI command.
+However, under the **stage-based CLI** paradigm, where each process strictly encapsulates a single stage, it is recommended to specify tuning parameters directly via discrete command-line flags for the respective stage, rather than constructing a composite `--stage-overrides` JSON string.
+
+For example, as an alternative to the following composite configuration:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+the stage-based CLI permits the direct initialization of Stage 1 with explicit parameters:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 ## JSON CLI Arguments
 
 --8<-- "docs/cli/json_tip.inc.md"
diff --git a/docs/configuration/stage_configs.md b/docs/configuration/stage_configs.md
index 55b4053cc71..4a7c9cc67c5 100644
--- a/docs/configuration/stage_configs.md
+++ b/docs/configuration/stage_configs.md
@@ -88,6 +88,55 @@ stages:
 | `--async-chunk` / `--no-async-chunk` | Flip the deploy YAML's `async_chunk:` bool. Unset (default) leaves the YAML value in force. |
 | `--stage-configs-path` | **Deprecated.** Accepts legacy `stage_args` yamls and (auto-detected) new deploy yamls; emits a deprecation warning. Migrate to `--deploy-config`. To be removed in a follow-up PR. |
 
+### Stage-Based CLI Paradigm
+
+The stage-based CLI paradigm facilitates the execution of discrete pipeline stages within isolated processes:
+
+- **Stage 0** typically encapsulates the orchestrator and the primary API server. Invocation requires `--stage-id 0`,
+  `--omni-master-address`, `--omni-master-port`, and standard port declarations (e.g., `--port`).
+- **Worker Stages** operate without a distinct API server (i.e., using `--headless`), are assigned sequential `--stage-id` identifiers, and must reference the corresponding
+  `--omni-master-address` and `--omni-master-port` parameters to successfully register with Stage 0.
+
+For migrated architectures, the system automatically resolves and loads the bundled deployment YAML. Consequently, the primary execution path
+does **not** necessitate the explicit definition of `--deploy-config`:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+When instantiating a custom deployment YAML conforming to the updated schema, append the `--deploy-config /path/to/override.yaml` directive
+to all node invocations. For legacy architectures (e.g., BAGEL) configured via deprecated `stage_args:` schemas, continue to specify the relevant configuration via `--stage-configs-path /path/to/config.yaml`.
+
+In the context of standard initialization architectures, utilizing the `--stage-overrides` parameter operates as the optimal methodology
+for delineating stage-specific tuning from the CLI interface:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+Conversely, in the context of the **stage-based CLI** paradigm, given that each execution process exclusively instantiates a single pipeline stage, configuration override attributes
+can be defined uniformly via explicit CLI flags on the corresponding instantiation command, rendering composite `--stage-overrides` JSON strings unnecessary:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 ### Precedence
 
 From highest to lowest:
@@ -133,6 +182,17 @@ vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
     --stage-overrides '{"0": {"max_num_seqs": 8}}'
 ```
 
+Within the stage-based CLI paradigm, equivalent configuration parameters can inherently be passed directly
+as command-line arguments to the designated single-stage process instantiation:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 0 \
+    --max-num-seqs 8 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 Effective config per stage after the merge:
 
 | Stage | Field | Final value | Source |
@@ -153,9 +213,14 @@ Therefore, as a core part of vLLM-Omni, the stage configs for a model have sever
 - Input and output dependencies for each stage.
 - Default input parameters.
 
-If users want to modify some part of it. The custom stage_configs file can be input as input argument in both online and offline. Just like examples below:
+To override specific parameters, explicitly inject the customized configuration schema
+in both online and offline instantiation flows. Prioritize the `--deploy-config` flag
+when loading the new-schema deploy YAML schemas, reserving the `--stage-configs-path` parameter
+exclusively to maintain compatibility with legacy `stage_args` YAML constructs.
+
+Examples:
 
-For offline (Assume necessary dependencies have ben imported):
+For offline (Assume necessary dependencies have been imported):
 ```python
 model_name = "Qwen/Qwen2.5-Omni-7B"
 omni = Omni(model=model_name, stage_configs_path="/path/to/custom_stage_configs.yaml")
@@ -163,7 +228,13 @@ omni = Omni(model=model_name, stage_configs_path="/path/to/custom_stage_configs.
 
 For online serving:
 ```bash
-vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
+vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 --deploy-config /path/to/deploy_config.yaml
+```
+
+Legacy online serving:
+
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
 !!! important
     We are actively iterating on the definition of stage configs, and we welcome all feedbacks from both community users and developers to help us shape the development!
diff --git a/docs/user_guide/examples/online_serving/bagel.md b/docs/user_guide/examples/online_serving/bagel.md
index 9de31926aa1..1a3fec9f426 100644
--- a/docs/user_guide/examples/online_serving/bagel.md
+++ b/docs/user_guide/examples/online_serving/bagel.md
@@ -22,9 +22,16 @@ Or use the convenience script:
 
 ```bash
 cd /workspace/vllm-omni/examples/online_serving/bagel
+# Launch both stages in one session (legacy convenience flow)
 bash run_server.sh
+
+# Launch a single stage per terminal
+bash run_server_stage_cli.sh --stage 0
+bash run_server_stage_cli.sh --stage 1
 ```
 
+If you have a custom stage configs file, launch the server with the command below:
+
 ```bash
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
@@ -115,12 +122,13 @@ mooncake_master \
 **2. Launch Stage 0 (Thinker / Orchestrator)** on the orchestrator node:
 
 ```bash
+# API server port for client requests: 8000
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
-    --port 8000 \ # API server port for client requests
+    --port 8000 \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 0 \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **3. Launch Stage 1 (DiT)** on the remote node in headless mode:
@@ -130,8 +138,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 1 \
     --headless \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **Mooncake Master arguments:**
@@ -150,8 +158,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
 | :------- | :---------- |
 | `--stage-id` | Which stage this process runs (0 = Thinker, 1 = DiT) |
 | `--headless` | Run without the API server (worker-only mode) |
-| `-oma` | Orchestrator master address |
-| `-omp` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
+| `--omni-master-address` | Orchestrator master address |
+| `--omni-master-port` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
 
 > [!IMPORTANT]
 > **Startup Order**: Stage 0 (orchestrator) must be launched **before** Stage 1 (headless).
@@ -165,7 +173,7 @@ All nodes must have network connectivity to each other. Ensure the following por
 | :--- | :------- | :------ | :-------- |
 | 50051 | TCP | Mooncake Master RPC | Worker → Orchestrator |
 | 8080 | TCP | Mooncake HTTP Metadata Server | Worker → Orchestrator |
-| 8091 | TCP | Orchestrator Master (`-omp`) | Worker → Orchestrator |
+| 8091 | TCP | Orchestrator Master (`--omni-master-port`) | Worker → Orchestrator |
 | 8000 | TCP | API Server (`--port`) | Client → Orchestrator |
 | 9003 | TCP | Metrics (optional) | Monitoring → Orchestrator |
 
diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
index 611eb6fd3fc..22d89ee8018 100644
--- a/docs/user_guide/examples/online_serving/qwen3_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -15,15 +15,72 @@ Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
 ```
 
-If you want to open async chunking for qwen3-omni, launch the server with command below
+The default deployment configuration situated at `vllm_omni/deploy/qwen3_omni_moe.yaml` is resolved and loaded
+automatically via the model registry, obviating the necessity for the `--deploy-config` flag in standard deployment topologies.
+Asynchronous chunk streaming is **enabled by default** within the bundled configuration.
 
+To explicitly utilize a custom deployment YAML, specify the configuration path:
 ```bash
-vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 --deploy-config /vllm_omni/deploy/qwen3_omni_moe.yaml
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --deploy-config /path/to/deploy_config_file
 ```
 
-If you have custom stage configs file, launch the server with command below
+### Launch individual stages (stage-based CLI)
+
+Adopt the stage-based CLI architecture to independently instantiate execution processes per functional stage.
+
+**1. Stage 0 (Thinker + API server)**
+
 ```bash
-vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 --deploy-config /path/to/deploy_config_file
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**2. Stage 1 (Talker)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**3. Stage 2 (Code2Wav)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 2 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+Add `--deploy-config /path/to/deploy_config_file` to every command if you want
+to override the bundled deploy YAML.
+
+For the regular one-process launch, stage-specific CLI tuning is usually done
+with `--stage-overrides`, for example:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+For the stage-based CLI, you usually do **not** need `--stage-overrides` for
+that kind of change. Since each command launches one stage, just pass the knob
+directly on that stage command:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
 ```
 
 ### Send Multi-modal Request
diff --git a/examples/online_serving/bagel/README.md b/examples/online_serving/bagel/README.md
index 0939bc5f387..4a87940434b 100644
--- a/examples/online_serving/bagel/README.md
+++ b/examples/online_serving/bagel/README.md
@@ -19,7 +19,12 @@ Or use the convenience script:
 
 ```bash
 cd /workspace/vllm-omni/examples/online_serving/bagel
+# Initialize all stages within a single unified session (legacy operational sequence)
 bash run_server.sh
+
+# Initialize each stage in a discrete isolated process terminal
+bash run_server_stage_cli.sh --stage 0
+bash run_server_stage_cli.sh --stage 1
 ```
 
 ```bash
@@ -112,12 +117,13 @@ mooncake_master \
 **2. Launch Stage 0 (Thinker / Orchestrator)** on the orchestrator node:
 
 ```bash
+# API server port for client requests: 8000
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
-    --port 8000 \ # API server port for client requests
+    --port 8000 \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 0 \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **3. Launch Stage 1 (DiT)** on the remote node in headless mode:
@@ -127,8 +133,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 1 \
     --headless \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **Mooncake Master arguments:**
@@ -145,14 +151,10 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
 
 | Argument | Description |
 | :------- | :---------- |
-| `--stage-id` | Which stage this process runs (0 = Thinker, 1 = DiT) |
-| `--headless` | Run without the API server (worker-only mode) |
-| `-oma` | Orchestrator master address |
-| `-omp` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
-
-> [!IMPORTANT]
-> **Startup Order**: Stage 0 (orchestrator) must be launched **before** Stage 1 (headless).
-> Stage 0 will appear to hang on startup until Stage 1 (worker) connects — this is expected behavior.
+| `--stage-id` | Designates the pipeline stage assigned to the process (e.g., 0 = Thinker, 1 = DiT) |
+| `--headless` | Executes the worker stage autonomously without initializing an API server |
+| `--omni-master-address` | Specifies the IP address binding the Orchestrator master node |
+| `--omni-master-port` | Specifies the targeted port establishing task coordination between Stage 1 and Stage 0 |
 
 **Network Requirements**
 
@@ -162,7 +164,7 @@ All nodes must have network connectivity to each other. Ensure the following por
 | :--- | :------- | :------ | :-------- |
 | 50051 | TCP | Mooncake Master RPC | Worker → Orchestrator |
 | 8080 | TCP | Mooncake HTTP Metadata Server | Worker → Orchestrator |
-| 8091 | TCP | Orchestrator Master (`-omp`) | Worker → Orchestrator |
+| 8091 | TCP | Orchestrator Master (`--omni-master-port`) | Worker → Orchestrator |
 | 8000 | TCP | API Server (`--port`) | Client → Orchestrator |
 | 9003 | TCP | Metrics (optional) | Monitoring → Orchestrator |
 
diff --git a/examples/online_serving/bagel/run_server_stage_cli.sh b/examples/online_serving/bagel/run_server_stage_cli.sh
index 2d0b4bc369e..18b4c937cac 100644
--- a/examples/online_serving/bagel/run_server_stage_cli.sh
+++ b/examples/online_serving/bagel/run_server_stage_cli.sh
@@ -116,8 +116,8 @@ run_stage_0() {
         --port "$PORT" \
         --stage-configs-path "$STAGE_CONFIGS_PATH" \
         --stage-id 0 \
-        -oma "$MASTER_ADDRESS" \
-        -omp "$MASTER_PORT" \
+        --omni-master-address "$MASTER_ADDRESS" \
+        --omni-master-port "$MASTER_PORT" \
         "${EXTRA_ARGS[@]}"
 }
 
@@ -127,8 +127,8 @@ run_stage_1() {
         --stage-configs-path "$STAGE_CONFIGS_PATH" \
         --stage-id 1 \
         --headless \
-        -oma "$MASTER_ADDRESS" \
-        -omp "$MASTER_PORT" \
+        --omni-master-address "$MASTER_ADDRESS" \
+        --omni-master-port "$MASTER_PORT" \
         "${EXTRA_ARGS[@]}"
 }
 
diff --git a/examples/online_serving/qwen3_omni/README.md b/examples/online_serving/qwen3_omni/README.md
index 32722b3db4e..c85970555f9 100644
--- a/examples/online_serving/qwen3_omni/README.md
+++ b/examples/online_serving/qwen3_omni/README.md
@@ -12,21 +12,80 @@ Please refer to [README.md](../../../README.md)
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
 ```
 
-The default deploy config at `vllm_omni/deploy/qwen3_omni_moe.yaml` is loaded
-automatically by the model registry — no `--deploy-config` flag needed for the
-common case. Async-chunk streaming is **enabled by default** in the bundled config.
-NPU / ROCm / XPU per-platform deltas are merged in automatically from the
-`platforms:` section of the same YAML.
+The default deployment configuration, situated at `vllm_omni/deploy/qwen3_omni_moe.yaml`, is resolved and loaded
+automatically via the model registry, obviating the `--deploy-config` flag in standard deployment topologies.
+Asynchronous chunk streaming operates as **enabled by default** within this bundled configuration.
+Additionally, NPU, ROCm, and XPU per-platform configuration deltas are deterministically merged from the
+`platforms`: section of the corresponding YAML.
 
-**Note:** The OpenAI-style **`/v1/realtime`** WebSocket (streaming PCM audio in, audio + transcription out) is **not supported** when `async_chunk` is enabled. Use the default omni layout or a stage config with `async_chunk: false` for realtime sessions.
-
-If you have a custom deploy YAML, point at it explicitly:
+**Note:** The OpenAI-style **`/v1/realtime`** WebSocket interface (facilitating streaming PCM audio input alongside audio and transcription output)
+is currently **unsupported** while the `async_chunk` configuration attribute is enabled.
+It is requisite to instantiate the default omni architecture or utilize a deployment configuration specifying `async_chunk: false` to facilitate real-time streaming sessions.
 
+To explicitly utilize a custom deployment YAML, mandate the configuration path accordingly:
 ```bash
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
     --deploy-config /path/to/your_deploy_config.yaml
 ```
 
+### Launch individual stages (stage-based CLI)
+
+Use the stage-based CLI when you want to run one stage per process.
+
+**1. Stage 0 (Thinker + API server)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**2. Stage 1 (Talker)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**3. Stage 2 (Code2Wav)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 2 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+Append `--deploy-config /path/to/your_deploy_config.yaml` to each node invocation if it is necessary
+to explicitly override the bundled deployment YAML schema.
+
+For standard **unified-process** launcher, stage-specific CLI configuration tuning is conventionally implemented
+via the `--stage-overrides` directive, as demonstrated below:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+Conversely, within the stage-based CLI paradigm, `--stage-overrides` modifiers are typically **unnecessary**
+for this category of optimization. Given that each instantiation strictly initiates a single functional stage,
+parameter flags can be systematically assigned directly onto that specific stage's command sequence:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 ### Tuning deployment parameters
 
 Most engine knobs (`max_num_batched_tokens`, `max_model_len`, `enforce_eager`,
@@ -93,6 +152,9 @@ vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
 Per-stage values are always treated as explicit and beat YAML defaults for
 the named stage. Other stages keep their YAML values.
 
+If you switch to the stage-based CLI, the same per-stage tuning can usually be
+passed directly on that stage's command instead of using `--stage-overrides`.
+
 #### 3. Custom deploy YAML
 
 When per-stage overrides get long, write a small overlay YAML that inherits
diff --git a/recipes/Qwen/Qwen3-Omni.md b/recipes/Qwen/Qwen3-Omni.md
index 081e1453d37..f78e4dda2aa 100644
--- a/recipes/Qwen/Qwen3-Omni.md
+++ b/recipes/Qwen/Qwen3-Omni.md
@@ -50,13 +50,22 @@ Start the server from the repository root:
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
 ```
 
-To enable async chunking, use the bundled stage config:
+Async chunking is enabled by default in the bundled deployment config. For
+common runtime tuning, prefer CLI overrides instead of editing or passing a
+custom YAML file:
 
 ```bash
-vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct \
-  --omni \
-  --port 8091 \
-  --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_omni_moe_async_chunk.yaml
+# Disable async chunking for /v1/realtime sessions
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+  --no-async-chunk
+```
+
+Use a custom deploy config only for advanced cases such as custom topology,
+connector wiring, or a larger overlay of stage defaults:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+  --deploy-config /path/to/your_qwen3_omni_overrides.yaml
 ```
 
 #### Verification
@@ -85,6 +94,6 @@ curl http://localhost:8091/v1/chat/completions \
 
 #### Notes
 
-- Memory usage: Size depends on runtime options and output modalities; leave headroom for multimodal workloads.
-- Key flags: `--omni` is required; `--stage-configs-path` is optional for custom or async-chunk stage configs.
-- Known limitations: This starter recipe is intentionally narrow and focuses on the single-GPU online-serving path already documented in the repo examples.
+- Memory usage: Size depends on runtime options and output modalities; leave headroom for multimodal workloads. Prefer CLI overrides such as `--gpu-memory-utilization` for routine tuning.
+- Key flags: `--omni` is required; async chunking is enabled by default; use `--no-async-chunk` for realtime sessions and `--deploy-config` only for advanced custom deployments.
+- Known limitations: The `/v1/realtime` WebSocket flow is currently unsupported while async chunking is enabled. This starter recipe is intentionally narrow and focuses on the single-GPU online-serving path already documented in the repo examples.