This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
Highlights
Improvements
Examples
Bug Fixing
Highlights
- Support vLLM CPU and IPEX CPU WOQ with Transformer-like API
- Support Streamingllm on Habana Gaudi
Improvements
- Add true_sequential for WOQ GPTQ (091f564 )
- Refine GPU scripts to support OOB mode (f4b3a7b )
- QBits adapt to the latest BesTLA (c169bec )
- Improve CPU WOQ scheme setting (fd3ee5cf1 )
- Enhance voicechat API with multilang tts streaming support (98daf37d )
- Add bias internal convertion in qbits (7c29f6f1 )
- Add DynamicQuantConfig, QuantAwareTrainingConfig and StaticQuantConfig (6a15b48, e1f4666d)
Examples
- Integrate EAGLE with ITREX (e559929d )
Bug Fixing
- Fix for token latency (ae7a4ae )
- Fix phi3 quantization scripts (2af19c7a6 )
- Fix is_intel_gpu_available (47d5024 )
- Fix QLoRA CPU issue due to internal API change (699ffca )
- Add scale and weight dtype check for quantization config (307c1a8b )
- Fix tf autodistill bug of transformers>=4.37 (8116fbb2 )
Validated Configurations
- Python 3.10
- Ubuntu 22.04
- PyTorch 2.2.0+cpu
- Intel® Extension for Torch 2.2.0+cpu