Skip to content

Commit 280e06d

Browse files
faradawnFunatiq
authored andcommitted
change parameters to match CI test in NVIDIA#8111
Signed-off-by: Faradawn Yang <[email protected]>
1 parent a8b366e commit 280e06d

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ stream_interval: 20
4747
num_postprocess_workers: 4
4848
kv_cache_config:
4949
enable_block_reuse: false
50+
free_gpu_memory_fraction: 0.6
5051
EOF
5152
```
5253

@@ -60,10 +61,10 @@ trtllm-serve Qwen/Qwen3-Next-80B-A3B-Thinking \
6061
--host 0.0.0.0 \
6162
--port 8000 \
6263
--backend pytorch \
63-
--max_batch_size 1 \
64+
--max_batch_size 720 \
6465
--max_num_tokens 4096 \
65-
--kv_cache_free_gpu_memory_fraction 0.6 \
6666
--tp_size 4 \
67+
--pp_size 1 \
6768
--ep_size 4 \
6869
--trust_remote_code \
6970
--extra_llm_api_options ${EXTRA_LLM_API_FILE}

0 commit comments

Comments
 (0)