Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ TensorRT-LLM supports the latest LLMs. Refer to the {ref}`support-matrix-softwar

TensorRT-LLM consists of pre– and post-processing steps and multi-GPU multi-node communication primitives in a simple, open-source Model Definition API for groundbreaking LLM inference performance on GPUs. Refer to the {ref}`multi-gpu-multi-node` section for more information.

### FP4 Support
[NVIDIA B200 GPUs](https://www.nvidia.com/en-us/data-center/dgx-b200/) , when used with TensorRT-LLM, enable seamless loading of model weights in the new [FP4 format](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/#what_is_nvfp4), allowing you to automatically leverage optimized FP4 kernels for efficient and accurate low-precision inference.

### FP8 Support

[NVIDIA H100 GPUs](https://www.nvidia.com/en-us/data-center/dgx-h100/) with TensorRT-LLM give you the ability to convert model weights into a new FP8 format easily and compile models to take advantage of optimized FP8 kernels automatically. This is made possible through [NVIDIA Hopper](https://blogs.nvidia.com/blog/h100-transformer-engine/) and done without having to change any model code.
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ The following table shows the supported software for TensorRT-LLM.
- [10.11](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
* - Precision
-
- Blackwell (SM100/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
- Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4
- Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4
- Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[^smgte89]
Expand Down