Skip to content

Commit c8f6d4d

Browse files
docs: add TRTLLM variable sliding window attention example for gemma3 model (#2134)
1 parent 347620a commit c8f6d4d

File tree

5 files changed

+135
-1
lines changed

5 files changed

+135
-1
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
tensor_parallel_size: 1
17+
backend: pytorch
18+
19+
kv_cache_config:
20+
max_attention_window:
21+
- 512
22+
- 512
23+
- 512
24+
- 512
25+
- 512
26+
- 32768
27+
enable_block_reuse: false
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
tensor_parallel_size: 1
17+
backend: pytorch
18+
19+
kv_cache_config:
20+
max_attention_window:
21+
- 512
22+
- 512
23+
- 512
24+
- 512
25+
- 512
26+
- 32768
27+
enable_block_reuse: false
28+
29+
cache_transceiver_config:
30+
backend: default
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
tensor_parallel_size: 1
17+
backend: pytorch
18+
disable_overlap_scheduler: True
19+
20+
kv_cache_config:
21+
max_attention_window:
22+
- 512
23+
- 512
24+
- 512
25+
- 512
26+
- 512
27+
- 32768
28+
enable_block_reuse: false
29+
30+
cache_transceiver_config:
31+
backend: default
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Gemma 3 with Variable Sliding Window Attention
19+
20+
This guide demonstrates how to deploy google/gemma-3-1b-it with Variable Sliding Window Attention (VSWA) using Dynamo. Since google/gemma-3-1b-it is a small model, each aggregated, decode, or prefill worker only requires one H100 GPU or one GB200 GPU.
21+
VSWA is a mechanism in which a model’s layers alternate between multiple sliding window sizes. An example of this is Gemma 3, which incorporates both global attention layers and sliding window layers.
22+
23+
## Notes
24+
* To run Gemma 3 with VSWA, ensure that the container has TensorRT-LLM v1.0.0rc4 installed.
25+
26+
## Limitation
27+
* The current KV event-based KV routing does not work well with VSWA. The Dynamo team is actively working on adding support to distinguish between events from different layer groups.
28+
29+
### Aggregated Serving
30+
```bash
31+
cd $DYNAMO_HOME/components/backends/trtllm
32+
export MODEL_PATH=google/gemma-3-1b-it
33+
export SERVED_MODEL_NAME=$MODEL_PATH
34+
export AGG_ENGINE_ARGS=engine_configs/gemma3/vswa_agg.yaml
35+
./launch/agg.sh
36+
```
37+
38+
#### Disaggregated Serving
39+
```bash
40+
cd $DYNAMO_HOME/components/backends/trtllm
41+
export MODEL_PATH=google/gemma-3-1b-it
42+
export SERVED_MODEL_NAME=$MODEL_PATH
43+
export PREFILL_ENGINE_ARGS=engine_configs/gemma3/vswa_prefill.yaml
44+
export DECODE_ENGINE_ARGS=engine_configs/gemma3/vswa_decode.yaml
45+
./launch/disagg.sh
46+
```

components/backends/trtllm/launch/disagg_router.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,4 @@ CUDA_VISIBLE_DEVICES=$DECODE_CUDA_VISIBLE_DEVICES python3 -m dynamo.trtllm \
5353
--extra-engine-args "$DECODE_ENGINE_ARGS" \
5454
--disaggregation-mode decode \
5555
--disaggregation-strategy "$DISAGGREGATION_STRATEGY" \
56-
"${EXTRA_DECODE_ARGS[@]}"
56+
"${EXTRA_DECODE_ARGS[@]}"

0 commit comments

Comments
 (0)