Skip to content

Commit 2d0c9b3

Browse files
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260)
Signed-off-by: Maurits de Groot <[email protected]>
1 parent 80043af commit 2d0c9b3

File tree

1 file changed

+3
-6
lines changed

1 file changed

+3
-6
lines changed

docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ docker run --rm --ipc=host -it \
3333
-p 8000:8000 \
3434
-e TRTLLM_ENABLE_PDL=1 \
3535
-v ~/.cache:/root/.cache:rw \
36-
nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc0 \
36+
nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc1 \
3737
/bin/bash
3838
```
3939

@@ -206,12 +206,10 @@ Currently, the best throughput **19.5k tps/gpu** is achieved with DP4EP4 using 4
206206

207207
## Launch the TensorRT-LLM Server
208208

209-
We can use `trtllm-serve` to serve the model by translating the benchmark commands above. For low-latency configuration, run:
209+
We can use `trtllm-serve` to serve the model by translating the benchmark commands above. For low-latency configuration, run:
210+
**Note:** You can also point to a local path containing the model weights instead of the HF repo (e.g., `${local_model_path}`).
210211

211212
```bash
212-
trtllm-serve \
213-
Note: You can also point to a local path containing the model weights instead of the HF repo (e.g., `${local_model_path}`).
214-
215213
trtllm-serve \
216214
openai/gpt-oss-120b \
217215
--host 0.0.0.0 \
@@ -230,7 +228,6 @@ The initialization may take several minutes as it loads and optimizes the models
230228
For max-throughput configuration, run:
231229

232230
```bash
233-
trtllm-serve \
234231
trtllm-serve \
235232
openai/gpt-oss-120b \
236233
--host 0.0.0.0 \

0 commit comments

Comments
 (0)