Releases: NVIDIA/TensorRT
Releases · NVIDIA/TensorRT
TensorRT OSS v10.5.0
Release 10.5-GA
Key Features and Updates:
- Demo changes
- Added Flux.1-dev pipeline
- Sample changes
- None
- Plugin changes
- Migrated
IPluginV2
-descendent versions ofbertQKVToContextPlugin
(1, 2, 3) to newer versions (4, 5, 6 respectively) which implementIPluginV3
. - Note:
- The newer versions preserve the attributes and I/O of the corresponding older plugin version
- The older plugin versions are deprecated and will be removed in a future release
- Migrated
- Quickstart guide
- None
- Parser changes
- Added support for real-valued
STFT
operations - Improved error handling in
IParser
- Added support for real-valued
Known issues:
- Demos:
- TensorRT engine might not be build successfully when using
--fp8
flag on H100 GPUs.
- TensorRT engine might not be build successfully when using
TensorRT OSS v10.4.0
10.4.0 GA - 2024-09-11
Key Features and Updates:
-
Demo changes
- Added Stable Cascade pipeline.
- Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
- Enabled FP8 quantization for Stable Diffusion XL pipeline.
-
Sample changes
- Add a new python sample
aliased_io_plugin
which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
- Add a new python sample
-
Plugin changes
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
- scatterElementsPlugin (1->2)
- skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
- embLayerNormPlugin (2->4, 3->5)
- bertQKVToContextPlugin (1->4, 2->5, 3->6)
- Note
- The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
- The older plugin versions are deprecated and will be removed in a future release.
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
-
Quickstart guide
- Updated deploy_to_triton guide and removed legacy APIs.
- Removed legacy TF-TRT code as the project is no longer supported.
- Removed quantization_tutorial as pytorch_quantization has been deprecated. Check out https://github.com/NVIDIA/TensorRT-Model-Optimizer for the latest quantization support. Check Stable Diffusion XL (Base/Turbo) and Stable Diffusion 1.5 Quantization with Model Optimizer for integration with TensorRT.
-
Parser changes
- Added support for tensor
axes
forPad
operations. - Added support for
BlackmanWindow
,HammingWindow
, andHannWindow
operations. - Improved error handling in
IParserRefitter
. - Fixed kernel shape inference in multi-input convolutions.
- Added support for tensor
-
Updated tooling
- polygraphy-extension-trtexec v0.0.9
TensorRT OSS v10.3.0
10.3.0 GA
Key Features and Updates:
- Demo changes
- Added Stable Video Diffusion(
SVD
) pipeline.
- Added Stable Video Diffusion(
- Plugin changes
- Deprecated Version 1 of ScatterElements plugin. It is superseded by Version 2, which implements the
IPluginV3
interface.
- Deprecated Version 1 of ScatterElements plugin. It is superseded by Version 2, which implements the
- Quickstart guide
- Updated the SemanticSegmentation guide with latest APIs.
- Parser changes
- Added support for tensor
axes
inputs forSlice
node. - Updated
ScatterElements
importer to use Version 2 of ScatterElements plugin, which implements theIPluginV3
interface.
- Added support for tensor
- Updated tooling
- Polygraphy v0.49.13
TensorRT OSS v10.2.0
Key Features and Updates:
- Demo changes
- Added Stable Diffusion 3 demo.
- Plugin changes
- Version 3 of the InstanceNormalization plugin (
InstanceNormalization_TRT
) has been added. This version is based on theIPluginV3
interface and is used by the TensorRT ONNX parser when nativeInstanceNormalization
is disabled.
- Version 3 of the InstanceNormalization plugin (
- Tooling changes
- Pytorch Quantization development has transitioned to TensorRT Model Optimizer. All developers are encouraged to use TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression.
- Build containers
- Updated default cuda versions to
12.5.0
.
- Updated default cuda versions to
TensorRT OSS v10.1.0
Key Features and Updates:
- Parser changes
- Added
supportsModelV2
API - Added support for
DeformConv
operation - Added support for
PluginV3
TensorRT Plugins - Marked all IParser and IParserRefitter APIs as
noexcept
- Added
- Plugin changes
- Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
- Samples changes
- Added a new sample non_zero_plugin, which is a Python version of the C++ sample sampleNonZeroPlugin.
- Updated tooling
- Polygraphy v0.49.12
- ONNX-GraphSurgeon v0.5.3
TensorRT OSS v10.0.1
Key Features and Updates:
- Parser changes
- Added support for building with
protobuf-lite
. - Fixed issue when parsing and refitting models with nested
BatchNormalization
nodes. - Added support for empty inputs in custom plugin nodes.
- Added support for building with
- Demo changes
- The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
- Updated tooling
- Polygraphy v0.49.10
- ONNX-GraphSurgeon v0.5.2
- Build Containers
- Updated default cuda versions to
12.4.0
. - Added Rocky Linux 8 and Rocky Linux 9 build containers
- Updated default cuda versions to
TensorRT v10.0.0
Key Features and Updates:
- Samples changes
- Parser changes
- Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
- kNATIVE_INSTANCENORM is now set to ON by default.
- Added support for IPluginV3 interfaces from TensorRT.
- Added support for INT4 quantization.
- Added support for the reduction attribute in ScatterElements.
- Added support for wrap padding mode in Pad
- Plugin changes
- A new plugin has been added in compliance with ONNX ScatterElements.
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
- All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
- disentangledAttentionPlugin: Fixed a kernel bug.
- Demo changes
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
- Updated tooling
- Polygraphy v0.49.9
- ONNX-GraphSurgeon v0.5.1
- TensorRT Engine Explorer v0.1.8
- Build Containers
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.
TensorRT OSS v9.3.0
TensorRT OSS release corresponding to TensorRT 9.3.0.1 release.
Updates since TensorRT 9.2.0 release.
Key Features and Updates:
- Faster Text-to-image using SDXL & INT8 quantization using AMMO
- Updated Polygraphy v0.49.7
TensorRT OSS v9.2.0
TensorRT OSS release corresponding to TensorRT 9.2.0.5 release.
Updates since TensorRT 9.1.0 release.
Key Features and Updates:
trtexec
enhancement: Added--weightless
flag to mark the engine as weightless.- Parser changes
- Added support for Hardmax operator.
- Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
- Plugin changes
- Explicit INT8 support added to
bertQKVToContextPlugin
. - Various bug fixes.
- Explicit INT8 support added to
- Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.
TensorRT OSS v9.1.0
TensorRT OSS release corresponding to TensorRT 9.1.0.4 GA release.
Updates since TensorRT 8.6.1 GA release.
Key Features and Updates:
- Update the trt_python_plugin sample.
- Python plugins API reference is part of the offical TRT Python API.
- Added samples demonstrating the usage of the progress monitor API.
- Check sampleProgressMonitor for the C++ sample.
- Check simple_progress_monitor for the Python sample.
- Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
- Demo changes
- Added LAMBADA dataset accuracy checks in the HuggingFace demo.
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the NeMo demo.
- Replaced deprecated APIs in the BERT demo.
- Updated tooling
- Polygraphy v0.49.1