Skip to content

pull #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 10, 2020
Merged

pull #119

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/en_US/CommunitySharings/ModelCompressionComparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The experiments are performed with the following pruners/datasets/models:

For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.

- All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/Overview.md).
- All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/Overview.md).

## Experiment Result

Expand Down Expand Up @@ -60,7 +60,7 @@ From the experiment result, we get the following conclusions:

* The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.

* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/ModelSpeedup.md).
* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/ModelSpeedup.md).
This avoids potential issues of counting them of masked models.

* The experiment code can be found [here]( https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Automatic Model Compression on NNI
# Automatic Model Pruning using NNI Tuners

It's convenient to implement auto model compression with NNI compression and NNI tuners
It's convenient to implement auto model pruning with NNI compression and NNI tuners

## First, model compression with NNI

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.

**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compression/Framework.html)


## Customize a new pruning algorithm
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,20 @@ Pruning algorithms compress the original network by removing redundant weights o

|Name|Brief Introduction of Algorithm|
|---|---|
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |

You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.

Expand All @@ -50,14 +50,14 @@ Quantization algorithms compress the original network by reducing the number of

|Name|Brief Introduction of Algorithm|
|---|---|
| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
| [Naive Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|

## Automatic Model Compression

Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoCompression.md).
Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoPruningUsingTuners.md).

## Model Speedup

Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In this tutorial, we use the [first section](#quick-start-to-compress-a-model) t

## Quick Start to Compress a Model

NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) as an example to show the usage.
NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) as an example to show the usage.

### Write configuration

Expand Down Expand Up @@ -175,7 +175,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz

### APIs for Updating Fine Tuning Status

Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.

`update_epoch` should be invoked in every epoch, while `step` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.

Expand Down
26 changes: 26 additions & 0 deletions docs/en_US/Compression/pruning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#################
Pruning
#################

Pruning is a common technique to compress neural network models.
The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.

From pruning granularity perspective, fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
Coarse-grained pruning or structured pruning is pruning entire group of weights, such as a convolutional filter.

NNI provides multiple unstructured pruning and structured pruning algorithms.
It supports Tensorflow and PyTorch with unified interface.
For users to prune their models, they only need to add several lines in their code.
For the structured filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
filter pruner will get better speed gain after the speedup.

For details, please refer to the following tutorials:

.. toctree::
:maxdepth: 2

Pruners <Pruner>
Dependency Aware Mode <DependencyAware>
Model Speedup <ModelSpeedup>
Automatic Model Pruning with NNI Tuners <AutoPruningUsingTuners>
17 changes: 17 additions & 0 deletions docs/en_US/Compression/quantization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#################
Quantization
#################

Quantization refers to compressing models by reducing the number of bits required to represent weights or activations,
which can reduce the computations and the inference time. In the context of deep neural networks, the major numerical
format for model weights is 32-bit float, or FP32. Many research works have demonstrated that weights and activations
can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
is an active field of research.

A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quntizers as below. You can also
create your own quntizer using NNI model compression interface.

.. toctree::
:maxdepth: 2

Quantizers <Quantizer>
2 changes: 1 addition & 1 deletion docs/en_US/FeatureEngineering/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,5 +267,5 @@ The code could be refenrence `/examples/feature_engineering/gradient_feature_sel
* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
* To know more about [Neural Architecture Search with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/Overview.md);
* To know more about [Model Compression with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/Overview.md);
* To know more about [Model Compression with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Compression/Overview.md);
* To know more about [Hyperparameter Tuning with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Tuner/BuiltinTuner.md);
Loading