vllm-project · xiaohajiayou · Apr 9, 2026 · Apr 9, 2026 · ianliuy · Apr 12, 2026
@@ -158,10 +158,23 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can
 
 For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)).
 
-1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
-2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).
+In multi-stage omni models, LLM stages and diffusion stages use different TP config fields:
 
-Example configuration for TP=2 on GPUs 0 and 1:
+1. **LLM stage**: set top-level `engine_args.tensor_parallel_size`.
+2. **Diffusion stage**: set `engine_args.parallel_config.tensor_parallel_size`.
+3. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the target stage (e.g., `"0,1"`).
+
+Example configuration for the diffusion stage with TP=2 on GPUs 0 and 1:
+```yaml
+    engine_args:
+      parallel_config:
+        tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+Example configuration for the LLM stage with TP=2 on GPUs 0 and 1:
 ```yaml
     engine_args:
       tensor_parallel_size: 2

@@ -35,6 +35,25 @@ For larger models or multi-GPU environments, you can enable Tensor Parallelism (
 
 1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).
 
+In multi-stage omni models, LLM stages and diffusion stages use different TP config fields:
+
+1. **LLM stage**: set top-level `engine_args.tensor_parallel_size`.
+2. **Diffusion stage**: set `engine_args.parallel_config.tensor_parallel_size`.
+3. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the target stage (e.g., `"0,1"`).
+
+Example configuration for the diffusion stage with TP=2 on GPUs 0 and 1:
+
+```yaml
+    engine_args:
+      parallel_config:
+        tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+Example configuration for the LLM stage with TP=2 on GPUs 0 and 1:
+
 ```yaml
     engine_args:
       tensor_parallel_size: 2

@@ -59,7 +59,8 @@ stage_args:
       distributed_executor_backend: "mp"
       enable_prefix_caching: false
       max_num_batched_tokens: 32768
-      tensor_parallel_size: 1
+      parallel_config:
+        tensor_parallel_size: 1
       omni_kv_config:
         need_recv_cache: true
     engine_input_source: [0]

@@ -52,7 +52,8 @@ stage_args:
       distributed_executor_backend: "mp"
       enable_prefix_caching: false
       max_num_batched_tokens: 32768
-      tensor_parallel_size: 1
+      parallel_config:
+        tensor_parallel_size: 1
       omni_kv_config:
         need_recv_cache: true
     engine_input_source: [0]

@@ -52,8 +52,8 @@ stage_args:
       distributed_executor_backend: "mp"
       enable_prefix_caching: false
       max_num_batched_tokens: 32768
-      tensor_parallel_size: 1
       parallel_config:
+        tensor_parallel_size: 1
         ulysses_degree: 2
         # ring_degree: 2
       omni_kv_config:

@@ -53,7 +53,8 @@ stage_args:
       distributed_executor_backend: "mp"
       enable_prefix_caching: false
       max_num_batched_tokens: 32768
-      tensor_parallel_size: 1
+      parallel_config:
+        tensor_parallel_size: 1
       omni_kv_config:
         need_recv_cache: true
     engine_input_source: [0]