Awesome ML Model Compression

An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!

A Survey of Model Compression and Acceleration for Deep Neural Networks
Model compression as constrained optimization, with application to neural nets. Part I: general framework
Model compression as constrained optimization, with application to neural nets. Part II: quantization
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Articles

Content published on the Web.

Howtos

How to Quantize Neural Networks with TensorFlow

Assorted

Why the Future of Machine Learning is Tiny
Deep Learning Model Compression for Image Analysis: Methods and Architectures
A foolproof way to shrink deep learning models by MIT (Alex Renda et al.) - A pruning algorithm: train to completion, globally prune the 20% of weights with the lowest magnitudes (the weakest connections), retrain with learning rate rewinding for the original (early training) rate, iteratively repeat until the desired sparsity is reached (model is as tiny as you want).

Reference

Blogs

TensorFlow Model Optimization Toolkit — Pruning API
Compressing neural networks for image classification and detection - Facebook AI researchers have developed a new method for reducing the memory footprint of neural networks by quantizing their weights, while maintaining a short inference time. They manage to get a 76.1% top-1 ResNet-50 that fits in 5 MB and also compress a Mask R-CNN within 6 MB.
All The Ways You Can Compress BERT - An overview of different compression methods for large NLP models (BERT) based on different characteristics and compares their results.
Deep Learning Model Compression methods.
Do We Really Need Model Compression in the future?

Tools

Libraries

TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. It's a based on QNNPACK library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators.
Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.

Frameworks

Paper Implementations

facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research.

Videos

Talks

Training & tutorials

License

To the extent possible under law, Cedric Chee has waived all copyright and related or neighboring rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome ML Model Compression

Contents

Papers

General

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Articles

Howtos

Assorted

Reference

Blogs

Tools

Libraries

Frameworks

Paper Implementations

Videos

Talks

Training & tutorials

License

About

Releases

Packages

victorVoice/awesome-ml-model-compression

Folders and files

Latest commit

History

Repository files navigation

Awesome ML Model Compression

Contents

Papers

General

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Articles

Howtos

Assorted

Reference

Blogs

Tools

Libraries

Frameworks

Paper Implementations

Videos

Talks

Training & tutorials

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages