renamed file to deployment-guide-for-trt-llm-llama3.3-70b.md

jamieliNVIDIA · jamieliNVIDIA · commit efb766eb9b59 · 2025-08-01T09:57:26.000-07:00
diff --git a/examples/models/core/llama3_3/deployment-guide-for-trt-llm-llama3.3-70b.md b/examples/models/core/llama3_3/deployment-guide-for-trt-llm-llama3.3-70b.md
@@ -49,7 +49,7 @@ Note:
 
 * You can mount additional directories and paths using the \-v \<local\_path\>:\<path\> flag if needed, such as mounting the downloaded weight paths.  
 * The command mounts your user .cache directory to save the downloaded model checkpoints which are saved to \~/.cache/huggingface/hub/ by default. This prevents having to redownload the weights each time you rerun the container. If the \~/.cache directory doesn’t exist please create it using  mkdir \~/.cache  
-* The command also maps port 8000 from the container to your host so you can access the LLM API endpoint from your host  
+* The command also maps port **8000** from the container to your host so you can access the LLM API endpoint from your host  
 * See the [https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for all the available containers. The containers published in the main branch weekly have “rcN” suffix, while the monthly release with QA tests has no “rcN” suffix. Use the rc release to get the latest model and feature support.
 
 If you want to use latest main branch, you can choose to build from source to install TensorRT-LLM, the steps refer to [https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html](https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html) 
@@ -117,7 +117,7 @@ These options are used directly on the command line when you start the `trtllm-s
 
 &emsp;**Description:** The maximum number of user requests that can be grouped into a single batch for processing.
 
-#### `--max_num_of_tokens`
+#### `--max_num_tokens`
 
 &emsp;**Description:** The maximum total number of tokens (across all requests) allowed inside a single scheduled batch.