You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sparsity of each layer is set the same as the overall sparsity in this experiment.
22
+
- Only **filter pruning** performances are compared here.
23
+
24
+
For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.
25
+
26
+
- All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/Overview.md).
27
+
28
+
## Experiment Result
29
+
30
+
For each dataset/model/pruner combination, we prune the model to different levels by setting a series of target sparsities for the pruner.
31
+
32
+
Here we plot both **Number of Weights - Performances** curve and **FLOPs - Performance** curve.
33
+
As a reference, we also plot the result declared in the paper [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](http://arxiv.org/abs/1907.03141) for models VGG16 and ResNet18 on CIFAR-10.
34
+
35
+
The experiment result are shown in the following figures:
From the experiment result, we get the following conclusions:
52
+
53
+
* Given the constraint on the number of parameters, the pruners with scheduling ( `AutoCompress Pruner` , `SimualatedAnnealing Pruner` ) performs better than the others when the constraint is strict. However, they have no such advantage in FLOPs/Performances comparison since only number of parameters constraint is considered in the optimization process;
54
+
* The basic algorithms `L1Filter Pruner` , `L2Filter Pruner` , `FPGM Pruner` performs very similarly in these experiments;
55
+
*`NetAdapt Pruner` can not achieve very high compression rate. This is caused by its mechanism that it prunes only one layer each pruning iteration. This leads to un-acceptable complexity if the sparsity per iteration is much lower than the overall sparisity constraint.
56
+
57
+
## Experiments Reproduction
58
+
59
+
### Implementation Details
60
+
61
+
* The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
62
+
63
+
* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/ModelSpeedup.md). This avoids potential issues of counting them of masked models.
64
+
65
+
* The experiment code can be found [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).
66
+
67
+
### Experiment Result Rendering
68
+
69
+
* If you follow the practice in the [example](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py), for every single pruning experiment, the experiment result will be saved in JSON format as follows:
* The experiment results are saved [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/experiment_data).
79
+
You can refer to [analyze](https://github.com/microsoft/nni/tree/master/examples/model_compress/experiment_data/analyze.py) to plot new performance comparison figures.
80
+
81
+
## Contribution
82
+
83
+
### TODO Items
84
+
85
+
* Pruners constrained by FLOPS/latency
86
+
* More pruning algorithms/datasets/models
87
+
88
+
### Issues
89
+
For algorithm implementation & experiment issues, please [create an issue](https://github.com/microsoft/nni/issues/new/).
Copy file name to clipboardExpand all lines: docs/en_US/Compressor/Overview.md
+1
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,7 @@ Pruning algorithms compress the original network by removing redundant weights o
42
42
|[SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner)| Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141)|
43
43
|[AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner)| Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141)|
44
44
45
+
You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/Benchmark.md) for the performance of these pruners on some benchmark problems.
This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
@@ -382,12 +368,6 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/amc/) for more information.
505
+
506
+
#### User configuration for AutoCompress Pruner
507
+
508
+
##### PyTorch
509
+
510
+
```eval_rst
511
+
.. autoclass:: nni.compression.torch.AMCPruner
512
+
```
500
513
501
514
## ADMM Pruner
502
515
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
@@ -588,3 +601,35 @@ We try to reproduce the experiment result of the fully connected network on MNIS
588
601

589
602
590
603
The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
604
+
605
+
606
+
## Sensitivity Pruner
607
+
For each round, SensitivityPruner prunes the model based on the sensitivity to the accuracy of each layer until meeting the final configured sparsity of the whole model:
608
+
1. Analyze the sensitivity of each layer in the current state of the model.
609
+
2. Prune each layer according to the sensitivity.
610
+
611
+
For more details, please refer to [Learning both Weights and Connections for Efficient Neural Networks ](https://arxiv.org/abs/1506.02626).
612
+
613
+
#### Usage
614
+
615
+
PyTorch code
616
+
617
+
```python
618
+
from nni.compression.torch import SensitivityPruner
Copy file name to clipboardExpand all lines: docs/en_US/TrainingService/AMLMode.md
+11-7
Original file line number
Diff line number
Diff line change
@@ -49,30 +49,34 @@ tuner:
49
49
trial:
50
50
command: python3 mnist.py
51
51
codeDir: .
52
-
computeTarget: ${replace_to_your_computeTarget}
53
52
image: msranni/nni
53
+
gpuNum: 1
54
54
amlConfig:
55
55
subscriptionId: ${replace_to_your_subscriptionId}
56
56
resourceGroup: ${replace_to_your_resourceGroup}
57
57
workspaceName: ${replace_to_your_workspaceName}
58
-
58
+
computeTarget: ${replace_to_your_computeTarget}
59
59
```
60
60
61
61
Note: You should set `trainingServicePlatform: aml` in NNI config YAML file if you want to start experiment in aml mode.
62
62
63
63
Compared with [LocalMode](LocalMode.md) trial configuration in aml mode have these additional keys:
64
-
* computeTarget
65
-
* required key. The compute cluster name you want to use in your AML workspace. See Step 6.
66
64
* image
67
65
* required key. The docker image name used in job. The image `msranni/nni` of this example only support GPU computeTargets.
68
66
69
67
amlConfig:
70
68
* subscriptionId
71
-
* the subscriptionId of your account
69
+
* required key, the subscriptionId of your account
72
70
* resourceGroup
73
-
* the resourceGroup of your account
71
+
* required key, the resourceGroup of your account
74
72
* workspaceName
75
-
* the workspaceName of your account
73
+
* required key, the workspaceName of your account
74
+
* computeTarget
75
+
* required key, the compute cluster name you want to use in your AML workspace. See Step 6.
76
+
* maxTrialNumPerGpu
77
+
* optional key, used to specify the max concurrency trial number on a GPU device.
78
+
* useActiveGpu
79
+
* optional key, used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU.
76
80
77
81
The required information of amlConfig could be found in the downloaded `config.json` in Step 5.
0 commit comments