Update TensorRT-LLM #1530

kaiyux · 2024-04-30T08:07:18Z

Model Support
- [Experimental] Support RecurrentGemma
Features
- Support paged KV cache for enc-dec models, note that the support is limited to beam width 1
Bug fixes
- Fix a requirement specification on Windows for nvidia-cudnn-cu12 ImportError: DLL load failed while importing tensorrt #1446
- Fix MMHA relative position calculation error in gpt_attention_plugin for enc-dec models Flan t5 xxl result large difference #1343
Benchmark
- [BREAKING CHANGE] Move request rate generation arguments and logic from prepare dataset script to gptManagerBenchmark
Performance
- Improve the performance of pipeline parallelism when enabling in-flight batching

Update TensorRT-LLM

8d3a920

Shixiaowei02 approved these changes Apr 30, 2024

View reviewed changes

kaiyux merged commit 06c0e9b into main Apr 30, 2024

kaiyux deleted the kaiyu/update branch April 30, 2024 09:19

Provide feedback