diff --git a/docs/source/quick-start-guide.md b/docs/source/quick-start-guide.md index 9fd9bb0914d..8cfe47e3a17 100644 --- a/docs/source/quick-start-guide.md +++ b/docs/source/quick-start-guide.md @@ -22,6 +22,13 @@ To start the server, you can run a command like the following example inside a D trtllm-serve "TinyLlama/TinyLlama-1.1B-Chat-v1.0" ``` +You may also deploy pre-quantized models to improve performance. +Ensure your GPU supports FP8 quantization before running the following: + +```bash +trtllm-serve "nvidia/Qwen3-8B-FP8" +``` + ```{note} If you are running trtllm-server inside a Docker container, you have two options for sending API requests: 1. Expose a port (e.g., 8000) to allow external access to the server from outside the container.