NVIDIA · VALLIS-NERIA · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025
@@ -96,12 +96,13 @@ The language component decides which quantization methods are supported by a giv
 | Model          |  NVFP4  | MXFP4  | FP8(per tensor)| FP8(block scaling) | FP8(rowwise) | FP8 KV Cache |W4A8 AWQ  | W4A16 AWQ | W4A8 GPTQ  | W4A16 GPTQ |
 | :------------- | :---:   | :---:  | :---: | :---: | :---: | :---: | :-------: | :-------: | :--------: | :--------: |
 | Blackwell(sm120)       |   Y     |   Y    |   Y   |   .   |   .   |   Y   |     .     |     .     |     .      |     .      |
-| Blackwell(sm100)       |   Y     |   Y    |   Y   |   Y   |   .   |   Y   |     .     |     .     |     .      |     .      |
+| Blackwell(sm100/103)       |   Y     |   Y    |   Y   |   Y   |   .   |   Y   |     .     |     .     |     .      |     .      |
 | Hopper           |   .     |   .    |   Y   |   Y   |   Y   |   Y   |     Y     |     Y     |     Y      |     Y      |
 | Ada Lovelace          |   .     |   .    |   Y   |   .   |   .   |   Y   |     Y     |     Y     |     Y      |     Y      |
 | Ampere         |   .     |   .    |   .   |   .   |   .   |   Y   |     .     |     Y     |     .      |     Y      |
+
 ```{note}
-FP8 block wise scaling GEMM kernels for sm100 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
+FP8 block wise scaling GEMM kernels for sm100/103 are using MXFP8 recipe (E4M3 act/weight and UE8M0 act/weight scale), which is slightly different from SM90 FP8 recipe (E4M3 act/weight and FP32 act/weight scale).
 ```
 
 

@@ -132,6 +132,7 @@ In addition, older architectures can have limitations for newer software release
   - TensorRT-LLM requires Linux x86_64 or Linux aarch64.
 * - GPU Model Architectures
   -
+    - [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
     - [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/)
     - [NVIDIA Blackwell Architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)
     - [NVIDIA Grace Hopper Superchip](https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/)
@@ -157,7 +158,7 @@ The following table shows the supported software for TensorRT-LLM.
   - [10.13](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
 * - Precision
   -
-    - Blackwell (SM100/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
+    - Blackwell (SM100/SM103/SM120) - FP32, FP16, BF16, FP8, FP4, INT8, INT4
     - Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4
     - Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4
     - Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[^smgte89]

@@ -37,8 +37,8 @@ Note: Support for other models may vary. Features marked "N/A" are not applicabl
 | Llama4ForConditionalGeneration | Yes               | Yes        | Yes                        | Yes                   | Yes             | No  | Yes                       | Yes                       | Yes           | Yes              | Untested       | N/A                      | Yes                   | Yes             |
 | GPT-OSS                        | Yes               | Yes        | Yes                        | Yes                   | No              | No  | Yes                       | No                        | Yes           | Yes              | No             | N/A                      | Yes                   | Yes             |
 
-[^1]: Chunked Prefill for MLA can only be enabled on SM100.
-[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100 and in BF16/FP8 KV cache dtype.
+[^1]: Chunked Prefill for MLA can only be enabled on SM100/SM103.
+[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100/SM103 and in BF16/FP8 KV cache dtype.
 
 
 # Multimodal Feature Support Matrix (PyTorch Backend)

@@ -49,8 +49,8 @@ TensorRT LLM strives to support the most popular models on **Day 0**.
 ### 🔧 **Latest GPU Architecture Support**
 
 TensorRT LLM supports the full spectrum of NVIDIA GPU architectures:
-- **NVIDIA Blackwell**: B200, GB200, RTX Pro 6000 SE with FP4 optimization
-- **NVIDIA Hopper**: H100, H200,GH200 with FP8 acceleration
+- **NVIDIA Blackwell**: B200, B300, GB200, GB300, RTX Pro 6000 SE with FP4 optimization
+- **NVIDIA Hopper**: H100, H200, GH200 with FP8 acceleration
 - **NVIDIA Ada Lovelace**: L40/L40S, RTX 40 series with FP8 acceleration
 - **NVIDIA Ampere**: A100, RTX 30 series for production workloads