Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #1530

Merged
merged 1 commit into from
Apr 30, 2024
Merged

Update TensorRT-LLM #1530

merged 1 commit into from
Apr 30, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Apr 30, 2024

  • Model Support
    • [Experimental] Support RecurrentGemma
  • Features
    • Support paged KV cache for enc-dec models, note that the support is limited to beam width 1
  • Bug fixes
  • Benchmark
    • [BREAKING CHANGE] Move request rate generation arguments and logic from prepare dataset script to gptManagerBenchmark
  • Performance
    • Improve the performance of pipeline parallelism when enabling in-flight batching

@kaiyux kaiyux merged commit 06c0e9b into main Apr 30, 2024
@kaiyux kaiyux deleted the kaiyu/update branch April 30, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants