You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/release-notes.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ All published functionality in the Release Notes has been fully tested and verif
6
6
7
7
## TensorRT-LLM Release 1.0
8
8
9
+
TensorRT LLM 1.0 brings 2 major changes: the PyTorch-based architecture is now stable and the default experience, and the LLM API is now stable. For more details on new developments in 1.0, please see below.
10
+
9
11
### Key Features and Enhancements
10
12
-**Model Support**
11
13
- Add Mistral3.1 VLM model support
@@ -92,14 +94,17 @@ All published functionality in the Release Notes has been fully tested and verif
92
94
- Add support for TRTLLM CustomDataset
93
95
- Make benchmark_serving part of the library
94
96
97
+
- Documentation:
98
+
- Refactored the doc structure to focus on the PyTorch workflow.
99
+
- Improved the LLMAPI and API reference documentation. Stable APIs are now protected and will remain consistent in subsequent versions following v1.0.
100
+
- Removed legacy documentation related to the TensorRT workflow.
95
101
96
102
### Infrastructure Changes
97
103
- The base Docker image for TensorRT-LLM is updated to `nvcr.io/nvidia/pytorch:25.06-py3`.
98
104
- The base Docker image for TensorRT-LLM Backend is updated to `nvcr.io/nvidia/tritonserver:25.06-py3`.
99
-
- The dependent public PyTorch version is updated to 2.8.0.
100
105
- The dependent NVIDIA ModelOpt version is updated to 0.33.
101
106
- The dependent xgrammar version is updated to 0.1.21.
102
-
- The dependent transformers version is updated to 4.51.3.
107
+
- The dependent transformers version is updated to 4.53.1.
103
108
104
109
### API Changes
105
110
-**BREAKING CHANGE** Promote PyTorch to be the default LLM backend
@@ -171,7 +176,6 @@ All published functionality in the Release Notes has been fully tested and verif
171
176
- Fix the unexpected keyword argument 'streaming' (#5436)
172
177
173
178
### Known Issues
174
-
- On bare-metal Ubuntu 22.04 or 24.04, please install the `cuda-python==12.9.1` package after installing the TensorRT-LLM wheel. This resolves an incompatibility issue with the default cuda-python 13 of error `ImportError: cannot import name 'cuda' from 'cuda'`.
175
179
- When using disaggregated serving with pipeline parallelism and KV cache reuse, a hang can occur. This will be fixed in a future release. In the meantime, disabling KV cache reuse will fix this issue.
176
180
- Running multi-node cases where each node has just a single GPU is known to fail. This will be addressed in a future release.
0 commit comments