Update TensorRT-LLM #2333

kaiyux · 2024-10-15T07:15:31Z

Model Support
- Supported Cohere Command R models, see examples/commandr/README.md.
- Supported Falcon 2, see examples/falcon/README.md, thanks to the contribution from @puneeshkhanna in Add support for falcon2 #1926.
Features
- Supported the Python LLM API for Mamba2.
- Supported INT8 GPTQ quantization.
- Supported FP8 for Deepseek-v1, see the “FP8 Quantization” section in examples/deepseek_v1/README.md.
- Added TensorRT OOTB support for INT8 Smooth Quantization.
- Supported quantization for Exaone model, see examples/exaone/README.md.
- Enable Medusa for Qwen2 models, see “Medusa with Qwen2” section in examples/medusa/README.md.
- Added request queueing time to gptManagerBenchmark tool.
- Optimized pipeline parallelism with ReduceScatter and AllGather for Mixtral models.
API
- [BREAKING CHANGE] Moved the flag builder_force_num_profiles in trtllm-build command to env var.
Bug fixes
- Fixed missing use_fused_mlp when constructing BuildConfig from dict, thanks for the fix from @ethnzhng in Include use_fused_mlp when constructing BuildConfig from dict #2081.
- Fixed lookahead batch layout for numNewTokensCumSum. ([Bug] Lookahead decoding is nondeterministic and wrong after the first call to runner.generate #2263)
Infra
- The dependent ModelOpt version is updated to 0.17.0.

open source 3eeadd9a4a9ca2558b3a2f2089419f8d285744e5

9c18fd2

Shixiaowei02 approved these changes Oct 15, 2024

View reviewed changes

kaiyux merged commit 75057cd into main Oct 15, 2024

kaiyux deleted the preview/main branch October 15, 2024 07:28

kaiyux mentioned this pull request Oct 15, 2024

[Bug] Lookahead decoding is nondeterministic and wrong after the first call to runner.generate #2263

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update TensorRT-LLM #2333

Update TensorRT-LLM #2333

kaiyux commented Oct 15, 2024 •

edited

Loading

Update TensorRT-LLM #2333

Update TensorRT-LLM #2333

Conversation

kaiyux commented Oct 15, 2024 • edited Loading

kaiyux commented Oct 15, 2024 •

edited

Loading