First release of Intel Labs' Model Compression Research Package, the current version includes model compression methods from previous published papers and our own research papers implementations:
- Pruning, quantization and knowledge distillation methods and schedulers that may fit various PyTorch models out-of-the-box
- Integration to HuggingFace/transformers library for most of the available methods
- Various examples showing how to use the library
- Prune Once for All: Sparse Pre-Trained Language Models reproduction guide and scripts