Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions docs/user_guide/examples/offline_inference/bagel.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,10 +158,23 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can

For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)).

1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).
In multi-stage omni models, LLM stages and diffusion stages use different TP config fields:

Example configuration for TP=2 on GPUs 0 and 1:
1. **LLM stage**: set top-level `engine_args.tensor_parallel_size`.
2. **Diffusion stage**: set `engine_args.parallel_config.tensor_parallel_size`.
3. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the target stage (e.g., `"0,1"`).

Example configuration for the diffusion stage with TP=2 on GPUs 0 and 1:
```yaml
engine_args:
parallel_config:
tensor_parallel_size: 2
...
runtime:
devices: "0,1"
```

Example configuration for the LLM stage with TP=2 on GPUs 0 and 1:
```yaml
engine_args:
tensor_parallel_size: 2
Expand Down
19 changes: 19 additions & 0 deletions docs/user_guide/examples/online_serving/bagel.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,25 @@ For larger models or multi-GPU environments, you can enable Tensor Parallelism (

1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This line still says "Set tensor_parallel_size to 2 (or more)" without distinguishing LLM vs diffusion stage, which is the whole point of this PR. Consider updating to:

Set the appropriate TP config field for your stage type (see details below) and update devices to include multiple GPU IDs.


In multi-stage omni models, LLM stages and diffusion stages use different TP config fields:

1. **LLM stage**: set top-level `engine_args.tensor_parallel_size`.
2. **Diffusion stage**: set `engine_args.parallel_config.tensor_parallel_size`.
3. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the target stage (e.g., `"0,1"`).

Example configuration for the diffusion stage with TP=2 on GPUs 0 and 1:

```yaml
engine_args:
parallel_config:
tensor_parallel_size: 2
...
runtime:
devices: "0,1"
```

Example configuration for the LLM stage with TP=2 on GPUs 0 and 1:

```yaml
engine_args:
tensor_parallel_size: 2
Expand Down
3 changes: 2 additions & 1 deletion vllm_omni/model_executor/stage_configs/bagel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ stage_args:
distributed_executor_backend: "mp"
enable_prefix_caching: false
max_num_batched_tokens: 32768
tensor_parallel_size: 1
parallel_config:
tensor_parallel_size: 1
omni_kv_config:
need_recv_cache: true
engine_input_source: [0]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ stage_args:
distributed_executor_backend: "mp"
enable_prefix_caching: false
max_num_batched_tokens: 32768
tensor_parallel_size: 1
parallel_config:
tensor_parallel_size: 1
omni_kv_config:
need_recv_cache: true
engine_input_source: [0]
Expand Down
2 changes: 1 addition & 1 deletion vllm_omni/model_executor/stage_configs/bagel_usp2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ stage_args:
distributed_executor_backend: "mp"
enable_prefix_caching: false
max_num_batched_tokens: 32768
tensor_parallel_size: 1
parallel_config:
tensor_parallel_size: 1
ulysses_degree: 2
# ring_degree: 2
omni_kv_config:
Expand Down
3 changes: 2 additions & 1 deletion vllm_omni/platforms/xpu/stage_configs/bagel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ stage_args:
distributed_executor_backend: "mp"
enable_prefix_caching: false
max_num_batched_tokens: 32768
tensor_parallel_size: 1
parallel_config:
tensor_parallel_size: 1
omni_kv_config:
need_recv_cache: true
engine_input_source: [0]
Expand Down
Loading