Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #1763

Merged
merged 1 commit into from
Jun 11, 2024
Merged

Update TensorRT-LLM #1763

merged 1 commit into from
Jun 11, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Jun 11, 2024

  • Model Support
    • Support Phi-3-medium models, see examples/phi/README.md
  • Features
    • Added support for quantized base model and FP16/BF16 LoRA.
  • API
    • [BREAKING CHANGE] max_batch_size in trtllm-build command is 256 by default now.
    • [BREAKING CHANGE] max_num_tokens in trtllm-build command is 8192 by default now.
    • [BREAKING CHANGE] api in gptManagerBenchmark command is executor by default now.
    • [BREAKING CHANGE] Added a bias argument to the LayerNorm module, and supports non-bias layer normalization.
    • [BREAKING CHANGE] Refactored LLM.generate() API.
      • Removed SamplingConfig
      • Added SamplingParams with some sampling parameters, see tensorrt_llm/hlapi/utils.py
      • Use SamplingParams instead of SamplingConfigin LLM.generate() API, see examples/high-level-api/README.md
    • [BREAKING CHANGE]: Refactored GptManager API
      • Move maxBeamWidth into TrtGptModelOptionalParams
      • Move schedulerConfig into TrtGptModelOptionalParams
  • Bug fixes
  • Performance
    • Low latency optimization
      • Added a reduce-norm feature which aims to fuse the ResidualAdd and LayerNorm kernels after AllReduce into a single kernel, which is recommended to be enabled when the batch size is small and the generation phase time is dominant.
      • Added FP8 support to the GEMM plugin, which benefits the cases when batch size is smaller than 4.
  • Documentation

@pfk-beta
Copy link

Hi, thanks for your hard work, btw. I have spotted huge removal in examples/run.py: db4edea#diff-299cb0140ad8f9d286c86ecc32b793b048531e27570675b94e54b57b66b3d7d5. Is it intented?

@pfk-beta
Copy link

Sorry for false alarm, these arguments was moved to utils. I didn't spotted it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants