Skip to content

Commit efb766e

Browse files
committed
renamed file to deployment-guide-for-trt-llm-llama3.3-70b.md
1 parent bc25b30 commit efb766e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/models/core/llama3_3/Deployment Guide for TRT-LLM + Llama3.3 70B.md renamed to examples/models/core/llama3_3/deployment-guide-for-trt-llm-llama3.3-70b.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Note:
4949

5050
* You can mount additional directories and paths using the \-v \<local\_path\>:\<path\> flag if needed, such as mounting the downloaded weight paths.
5151
* The command mounts your user .cache directory to save the downloaded model checkpoints which are saved to \~/.cache/huggingface/hub/ by default. This prevents having to redownload the weights each time you rerun the container. If the \~/.cache directory doesn’t exist please create it using mkdir \~/.cache
52-
* The command also maps port 8000 from the container to your host so you can access the LLM API endpoint from your host
52+
* The command also maps port **8000** from the container to your host so you can access the LLM API endpoint from your host
5353
* See the [https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for all the available containers. The containers published in the main branch weekly have “rcN” suffix, while the monthly release with QA tests has no “rcN” suffix. Use the rc release to get the latest model and feature support.
5454

5555
If you want to use latest main branch, you can choose to build from source to install TensorRT-LLM, the steps refer to [https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html](https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html)
@@ -117,7 +117,7 @@ These options are used directly on the command line when you start the `trtllm-s
117117

118118
&emsp;**Description:** The maximum number of user requests that can be grouped into a single batch for processing.
119119

120-
#### `--max_num_of_tokens`
120+
#### `--max_num_tokens`
121121

122122
&emsp;**Description:** The maximum total number of tokens (across all requests) allowed inside a single scheduled batch.
123123

0 commit comments

Comments
 (0)