vllm-project · hsliuustc0106 · Apr 24, 2026 · Apr 21, 2026 · Apr 22, 2026 · Apr 23, 2026
@@ -1,5 +1,59 @@
 # vllm-omni serve
 
+## Stage-based CLI quickstart
+
+The stage-based CLI is designed for deployments that require launching each pipeline stage in an isolated process
+(e.g., across separate operating system processes, distinct GPUs, or distributed hosts).
+
+- For **migrated models** that utilize the bundled deployment YAML configurations located in
+  `vllm_omni/deploy/`, the `--deploy-config` flag is only required to override the default configuration. By default, executing `vllm serve MODEL --omni ...`
+  automatically loads the bundled deployment configuration.
+- For **legacy models** utilizing configuration files located in
+  `vllm_omni/model_executor/stage_configs/`, the `--stage-configs-path` parameter remains mandatory.
+
+Example: Initializing Stage 0 (Orchestrator and API Server):
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+Example: Initializing a Headless Worker Stage (Stage 1):
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+When utilizing a custom deployment YAML based on the new schema, append `--deploy-config /path/to/override.yaml` to each command execution. Conversely, for legacy models, substitute this parameter with `--stage-configs-path /path/to/stage_configs.yaml`.
+
+In the standard execution paradigm, the `--stage-overrides` argument is utilized to apply stage-specific configurations from a single CLI command.
+However, under the **stage-based CLI** paradigm, where each process strictly encapsulates a single stage, it is recommended to specify tuning parameters directly via discrete command-line flags for the respective stage, rather than constructing a composite `--stage-overrides` JSON string.
+
+For example, as an alternative to the following composite configuration:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+the stage-based CLI permits the direct initialization of Stage 1 with explicit parameters:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 ## JSON CLI Arguments
 
 --8<-- "docs/cli/json_tip.inc.md"

@@ -88,6 +88,55 @@ stages:
 | `--async-chunk` / `--no-async-chunk` | Flip the deploy YAML's `async_chunk:` bool. Unset (default) leaves the YAML value in force. |
 | `--stage-configs-path` | **Deprecated.** Accepts legacy `stage_args` yamls and (auto-detected) new deploy yamls; emits a deprecation warning. Migrate to `--deploy-config`. To be removed in a follow-up PR. |
 
+### Stage-Based CLI Paradigm
+
+The stage-based CLI paradigm facilitates the execution of discrete pipeline stages within isolated processes:
+
+- **Stage 0** typically encapsulates the orchestrator and the primary API server. Invocation requires `--stage-id 0`,
+  `--omni-master-address`, `--omni-master-port`, and standard port declarations (e.g., `--port`).
+- **Worker Stages** operate without a distinct API server (i.e., using `--headless`), are assigned sequential `--stage-id` identifiers, and must reference the corresponding
+  `--omni-master-address` and `--omni-master-port` parameters to successfully register with Stage 0.
+
+For migrated architectures, the system automatically resolves and loads the bundled deployment YAML. Consequently, the primary execution path
+does **not** necessitate the explicit definition of `--deploy-config`:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+When instantiating a custom deployment YAML conforming to the updated schema, append the `--deploy-config /path/to/override.yaml` directive
+to all node invocations. For legacy architectures (e.g., BAGEL) configured via deprecated `stage_args:` schemas, continue to specify the relevant configuration via `--stage-configs-path /path/to/config.yaml`.
+
+In the context of standard initialization architectures, utilizing the `--stage-overrides` parameter operates as the optimal methodology
+for delineating stage-specific tuning from the CLI interface:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+Conversely, in the context of the **stage-based CLI** paradigm, given that each execution process exclusively instantiates a single pipeline stage, configuration override attributes
+can be defined uniformly via explicit CLI flags on the corresponding instantiation command, rendering composite `--stage-overrides` JSON strings unnecessary:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 ### Precedence
 
 From highest to lowest:
@@ -133,6 +182,17 @@ vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
     --stage-overrides '{"0": {"max_num_seqs": 8}}'
 ```
 
+Within the stage-based CLI paradigm, equivalent configuration parameters can inherently be passed directly
+as command-line arguments to the designated single-stage process instantiation:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 0 \
+    --max-num-seqs 8 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
 Effective config per stage after the merge:
 
 | Stage | Field | Final value | Source |
@@ -153,17 +213,28 @@ Therefore, as a core part of vLLM-Omni, the stage configs for a model have sever
 - Input and output dependencies for each stage.
 - Default input parameters.
 
-If users want to modify some part of it. The custom stage_configs file can be input as input argument in both online and offline. Just like examples below:
+To override specific parameters, explicitly inject the customized configuration schema
+in both online and offline instantiation flows. Prioritize the `--deploy-config` flag
+when loading the new-schema deploy YAML schemas, reserving the `--stage-configs-path` parameter
+exclusively to maintain compatibility with legacy `stage_args` YAML constructs.
+
+Examples:
 
-For offline (Assume necessary dependencies have ben imported):
+For offline (Assume necessary dependencies have been imported):
 ```python
 model_name = "Qwen/Qwen2.5-Omni-7B"
 omni = Omni(model=model_name, stage_configs_path="/path/to/custom_stage_configs.yaml")
 ```
 
 For online serving:
 ```bash
-vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
+vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 --deploy-config /path/to/deploy_config.yaml
+```
+
+Legacy online serving:
+
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
 !!! important
     We are actively iterating on the definition of stage configs, and we welcome all feedbacks from both community users and developers to help us shape the development!

@@ -22,9 +22,16 @@ Or use the convenience script:
 
 ```bash
 cd /workspace/vllm-omni/examples/online_serving/bagel
+# Launch both stages in one session (legacy convenience flow)
 bash run_server.sh
+
+# Launch a single stage per terminal
+bash run_server_stage_cli.sh --stage 0
+bash run_server_stage_cli.sh --stage 1
 ```
 
+If you have a custom stage configs file, launch the server with the command below:
+
 ```bash
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
@@ -115,12 +122,13 @@ mooncake_master \
 **2. Launch Stage 0 (Thinker / Orchestrator)** on the orchestrator node:
 
 ```bash
+# API server port for client requests: 8000
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
-    --port 8000 \ # API server port for client requests
+    --port 8000 \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 0 \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **3. Launch Stage 1 (DiT)** on the remote node in headless mode:
@@ -130,8 +138,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 1 \
     --headless \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **Mooncake Master arguments:**
@@ -150,8 +158,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
 | :------- | :---------- |
 | `--stage-id` | Which stage this process runs (0 = Thinker, 1 = DiT) |
 | `--headless` | Run without the API server (worker-only mode) |
-| `-oma` | Orchestrator master address |
-| `-omp` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
+| `--omni-master-address` | Orchestrator master address |
+| `--omni-master-port` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
 
 > [!IMPORTANT]
 > **Startup Order**: Stage 0 (orchestrator) must be launched **before** Stage 1 (headless).
@@ -165,7 +173,7 @@ All nodes must have network connectivity to each other. Ensure the following por
 | :--- | :------- | :------ | :-------- |
 | 50051 | TCP | Mooncake Master RPC | Worker → Orchestrator |
 | 8080 | TCP | Mooncake HTTP Metadata Server | Worker → Orchestrator |
-| 8091 | TCP | Orchestrator Master (`-omp`) | Worker → Orchestrator |
+| 8091 | TCP | Orchestrator Master (`--omni-master-port`) | Worker → Orchestrator |
 | 8000 | TCP | API Server (`--port`) | Client → Orchestrator |
 | 9003 | TCP | Metrics (optional) | Monitoring → Orchestrator |
 

@@ -15,15 +15,72 @@ Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/
 vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
 ```
 
-If you want to open async chunking for qwen3-omni, launch the server with command below
+The default deployment configuration situated at `vllm_omni/deploy/qwen3_omni_moe.yaml` is resolved and loaded
+automatically via the model registry, obviating the necessity for the `--deploy-config` flag in standard deployment topologies.
+Asynchronous chunk streaming is **enabled by default** within the bundled configuration.
 
+To explicitly utilize a custom deployment YAML, specify the configuration path:
 ```bash
-vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 --deploy-config /vllm_omni/deploy/qwen3_omni_moe.yaml
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --deploy-config /path/to/deploy_config_file
 ```
 
-If you have custom stage configs file, launch the server with command below
+### Launch individual stages (stage-based CLI)
+
+Adopt the stage-based CLI architecture to independently instantiate execution processes per functional stage.
+
+**1. Stage 0 (Thinker + API server)**
+
 ```bash
-vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 --deploy-config /path/to/deploy_config_file
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --port 8091 \
+    --stage-id 0 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**2. Stage 1 (Talker)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+**3. Stage 2 (Code2Wav)**
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 2 \
+    --headless \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
+```
+
+Add `--deploy-config /path/to/deploy_config_file` to every command if you want
+to override the bundled deploy YAML.
+
+For the regular one-process launch, stage-specific CLI tuning is usually done
+with `--stage-overrides`, for example:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091 \
+    --stage-overrides '{"1": {"gpu_memory_utilization": 0.5}}'
+```
+
+For the stage-based CLI, you usually do **not** need `--stage-overrides` for
+that kind of change. Since each command launches one stage, just pass the knob
+directly on that stage command:
+
+```bash
+vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct --omni \
+    --stage-id 1 \
+    --headless \
+    --gpu-memory-utilization 0.5 \
+    --omni-master-address 127.0.0.1 \
+    --omni-master-port 26000
 ```
 
 ### Send Multi-modal Request

@@ -19,7 +19,12 @@ Or use the convenience script:
 
 ```bash
 cd /workspace/vllm-omni/examples/online_serving/bagel
+# Initialize all stages within a single unified session (legacy operational sequence)
 bash run_server.sh
+
+# Initialize each stage in a discrete isolated process terminal
+bash run_server_stage_cli.sh --stage 0
+bash run_server_stage_cli.sh --stage 1
 ```
 
 ```bash
@@ -112,12 +117,13 @@ mooncake_master \
 **2. Launch Stage 0 (Thinker / Orchestrator)** on the orchestrator node:
 
 ```bash
+# API server port for client requests: 8000
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
-    --port 8000 \ # API server port for client requests
+    --port 8000 \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 0 \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **3. Launch Stage 1 (DiT)** on the remote node in headless mode:
@@ -127,8 +133,8 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
     --stage-configs-path vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml \
     --stage-id 1 \
     --headless \
-    -oma <ORCHESTRATOR_IP> \
-    -omp 8091
+    --omni-master-address <ORCHESTRATOR_IP> \
+    --omni-master-port 8091
 ```
 
 **Mooncake Master arguments:**
@@ -145,14 +151,10 @@ vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
 
 | Argument | Description |
 | :------- | :---------- |
-| `--stage-id` | Which stage this process runs (0 = Thinker, 1 = DiT) |
-| `--headless` | Run without the API server (worker-only mode) |
-| `-oma` | Orchestrator master address |
-| `-omp` | Orchestrator master port for Stage 1 to connect to Stage 0 for task coordination |
-
-> [!IMPORTANT]
-> **Startup Order**: Stage 0 (orchestrator) must be launched **before** Stage 1 (headless).
-> Stage 0 will appear to hang on startup until Stage 1 (worker) connects — this is expected behavior.
+| `--stage-id` | Designates the pipeline stage assigned to the process (e.g., 0 = Thinker, 1 = DiT) |
+| `--headless` | Executes the worker stage autonomously without initializing an API server |
+| `--omni-master-address` | Specifies the IP address binding the Orchestrator master node |
+| `--omni-master-port` | Specifies the targeted port establishing task coordination between Stage 1 and Stage 0 |
 
 **Network Requirements**
 
@@ -162,7 +164,7 @@ All nodes must have network connectivity to each other. Ensure the following por
 | :--- | :------- | :------ | :-------- |
 | 50051 | TCP | Mooncake Master RPC | Worker → Orchestrator |
 | 8080 | TCP | Mooncake HTTP Metadata Server | Worker → Orchestrator |
-| 8091 | TCP | Orchestrator Master (`-omp`) | Worker → Orchestrator |
+| 8091 | TCP | Orchestrator Master (`--omni-master-port`) | Worker → Orchestrator |
 | 8000 | TCP | API Server (`--port`) | Client → Orchestrator |
 | 9003 | TCP | Metrics (optional) | Monitoring → Orchestrator |
 

@@ -116,8 +116,8 @@ run_stage_0() {
         --port "$PORT" \
         --stage-configs-path "$STAGE_CONFIGS_PATH" \
         --stage-id 0 \
-        -oma "$MASTER_ADDRESS" \
-        -omp "$MASTER_PORT" \
+        --omni-master-address "$MASTER_ADDRESS" \
+        --omni-master-port "$MASTER_PORT" \
         "${EXTRA_ARGS[@]}"
 }
 
@@ -127,8 +127,8 @@ run_stage_1() {
         --stage-configs-path "$STAGE_CONFIGS_PATH" \
         --stage-id 1 \
         --headless \
-        -oma "$MASTER_ADDRESS" \
-        -omp "$MASTER_PORT" \
+        --omni-master-address "$MASTER_ADDRESS" \
+        --omni-master-port "$MASTER_PORT" \
         "${EXTRA_ARGS[@]}"
 }