Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Intel® Extension for Transformers v1.4.2 Release

Latest
Compare
Choose a tag to compare
@kevinintel kevinintel released this 24 May 12:23
· 88 commits to main since this release
0e13607

Highlights
Improvements
Examples
Bug Fixing

Highlights

  • Support vLLM CPU and IPEX CPU WOQ with Transformer-like API
  • Support Streamingllm on Habana Gaudi

Improvements

  • Add true_sequential for WOQ GPTQ (091f564 )
  • Refine GPU scripts to support OOB mode (f4b3a7b )
  • QBits adapt to the latest BesTLA (c169bec )
  • Improve CPU WOQ scheme setting (fd3ee5cf1 )
  • Enhance voicechat API with multilang tts streaming support (98daf37d )
  • Add bias internal convertion in qbits (7c29f6f1 )
  • Add DynamicQuantConfig, QuantAwareTrainingConfig and StaticQuantConfig (6a15b48, e1f4666d)

Examples

  • Integrate EAGLE with ITREX (e559929d )

Bug Fixing

  • Fix for token latency (ae7a4ae )
  • Fix phi3 quantization scripts (2af19c7a6 )
  • Fix is_intel_gpu_available (47d5024 )
  • Fix QLoRA CPU issue due to internal API change (699ffca )
  • Add scale and weight dtype check for quantization config (307c1a8b )
  • Fix tf autodistill bug of transformers>=4.37 (8116fbb2 )

Validated Configurations

  • Python 3.10
  • Ubuntu 22.04
  • PyTorch 2.2.0+cpu
  • Intel® Extension for Torch 2.2.0+cpu