Skip to content

Commit c5d0c92

Browse files
committed
add documentation section.
Signed-off-by: nv-guomingz <[email protected]>
1 parent 61e9c18 commit c5d0c92

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

docs/source/release-notes.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ All published functionality in the Release Notes has been fully tested and verif
66

77
## TensorRT-LLM Release 1.0
88

9+
TensorRT LLM 1.0 brings 2 major changes: the PyTorch-based architecture is now stable and the default experience, and the LLM API is now stable. For more details on new developments in 1.0, please see below.
10+
911
### Key Features and Enhancements
1012
- **Model Support**
1113
- Add Mistral3.1 VLM model support
@@ -92,14 +94,17 @@ All published functionality in the Release Notes has been fully tested and verif
9294
- Add support for TRTLLM CustomDataset
9395
- Make benchmark_serving part of the library
9496

97+
- Documentation:
98+
- Refactored the doc structure to focus on the PyTorch workflow.
99+
- Improved the LLMAPI and API reference documentation. Stable APIs are now protected and will remain consistent in subsequent versions following v1.0.
100+
- Removed legacy documentation related to the TensorRT workflow.
95101

96102
### Infrastructure Changes
97103
- The base Docker image for TensorRT-LLM is updated to `nvcr.io/nvidia/pytorch:25.06-py3`.
98104
- The base Docker image for TensorRT-LLM Backend is updated to `nvcr.io/nvidia/tritonserver:25.06-py3`.
99-
- The dependent public PyTorch version is updated to 2.8.0.
100105
- The dependent NVIDIA ModelOpt version is updated to 0.33.
101106
- The dependent xgrammar version is updated to 0.1.21.
102-
- The dependent transformers version is updated to 4.51.3.
107+
- The dependent transformers version is updated to 4.53.1.
103108

104109
### API Changes
105110
- **BREAKING CHANGE** Promote PyTorch to be the default LLM backend
@@ -171,7 +176,6 @@ All published functionality in the Release Notes has been fully tested and verif
171176
- Fix the unexpected keyword argument 'streaming' (#5436)
172177

173178
### Known Issues
174-
- On bare-metal Ubuntu 22.04 or 24.04, please install the `cuda-python==12.9.1` package after installing the TensorRT-LLM wheel. This resolves an incompatibility issue with the default cuda-python 13 of error `ImportError: cannot import name 'cuda' from 'cuda'`.
175179
- When using disaggregated serving with pipeline parallelism and KV cache reuse, a hang can occur. This will be fixed in a future release. In the meantime, disabling KV cache reuse will fix this issue.
176180
- Running multi-node cases where each node has just a single GPU is known to fail. This will be addressed in a future release.
177181

0 commit comments

Comments
 (0)