Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT 10.0 Release #3766

Merged
merged 1 commit into from
Apr 3, 2024
Merged

TensorRT 10.0 Release #3766

merged 1 commit into from
Apr 3, 2024

Conversation

asfiyab-nvidia
Copy link
Collaborator

10.0.0 EA - 2024-04-02

Key Features and Updates:

  • Samples changes
    • Added a sample showcasing weight-stripped engines.
    • Added a sample demonstrating the use of custom tactics with IPluginV3.
    • Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
  • Parser changes
    • Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
    • kNATIVE_INSTANCENORM is now set to ON by default.
    • Added support for IPluginV3 interfaces from TensorRT.
    • Added support for INT4 quantization.
    • Added support for the reduction attribute in ScatterElements.
    • Added support for wrap padding mode in Pad
  • Plugin changes
    • A new plugin has been added in compliance with ONNX ScatterElements.
    • The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
    • All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
    • bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
    • reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
    • disentangledAttentionPlugin: Fixed a kernel bug.
  • Demo changes
    • HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
  • Updated tooling
    • Polygraphy v0.49.9
    • ONNX-GraphSurgeon v0.5.1
    • TensorRT Engine Explorer v0.1.8
  • Build Containers
    • RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

Signed-off-by: Asfiya Baig <[email protected]>
@asfiyab-nvidia
Copy link
Collaborator Author

@rajeevsrao can you please review

@rajeevsrao rajeevsrao merged commit 147005f into NVIDIA:main Apr 3, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants