Skip to content

Commit 88f8c1b

Browse files
authored
Merge pull request #273 from microsoft/master
merge master
2 parents c4f6e66 + ec5af41 commit 88f8c1b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1285
-158
lines changed
+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Dependency-aware Mode for Filter Pruning
2+
3+
Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.
4+
5+
As [dependency analysis utils](./CompressionUtils.md) shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see [Compression Utils](./CompressionUtils.md)). Take the following figure as an example.
6+
![](../../img/mask_conflict.jpg)
7+
8+
If we prune the first 50% of output channels(filters) for conv1, and prune the last 50% of output channels for conv2. Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels. In this case, we cannot harvest the speed benefit from the model pruning.
9+
10+
11+
To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the Filter Pruner. In the dependency-aware mode, the pruner prunes the model not only based on the l1 norm of each filter, but also the topology of the whole network architecture.
12+
13+
In the dependency-aware mode(`dependency_aware` is set `True`), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
14+
15+
![](../../img/dependency-aware.jpg)
16+
17+
Take the dependency-aware mode of L1Filter Pruner as an example. Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel. Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set(denoted by `min_sparsity`). According to the L1 norm sum of each channel, the pruner will prune the same `min_sparsity` channels for all the layers. Next, the pruner will additionally prune `sparsity` - `min_sparsity` channels for each convolutional layer based on its own L1 norm of each channel. For example, suppose the output channels of `conv1` , `conv2` are added together and the configured sparsities of `conv1` and `conv2` are 0.3, 0.2 respectively. In this case, the `dependency-aware pruner` will
18+
19+
- First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`.
20+
- Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
21+
22+
In addition, for the convolutional layers that have more than one filter group, `dependency-aware pruner` will also try to prune the same number of the channels for each filter group. Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains(channel dependency, etc) to improve the final speed gain after the speedup process.
23+
24+
In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
25+
26+
## Usage
27+
In this section, we will show how to enable the dependency-aware mode for the filter pruner. Currently, only the one-shot pruners such as FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner, support the dependency-aware mode.
28+
29+
To enable the dependency-aware mode for `L1FilterPruner`:
30+
```python
31+
from nni.compression.torch import L1FilterPruner
32+
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
33+
# dummy_input is necessary for the dependency_aware mode
34+
dummy_input = torch.ones(1, 3, 224, 224).cuda()
35+
pruner = L1FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
36+
# for L2FilterPruner
37+
# pruner = L2FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
38+
# for FPGMPruner
39+
# pruner = FPGMPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
40+
# for ActivationAPoZRankFilterPruner
41+
# pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1, , dependency_aware=True, dummy_input=dummy_input)
42+
# for ActivationMeanRankFilterPruner
43+
# pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
44+
# for TaylorFOWeightFilterPruner
45+
# pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
46+
47+
pruner.compress()
48+
```
49+
50+
## Evaluation
51+
In order to compare the performance of the pruner with or without the dependency-aware mode, we use L1FilterPruner to prune the Mobilenet_v2 separately when the dependency-aware mode is turned on and off. To simplify the experiment, we use the uniform pruning which means we allocate the same sparsity for all convolutional layers in the model.
52+
We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based on this pretrained checkpoint. The following figure shows the accuracy and FLOPs of the model pruned by different pruners.
53+
![](../../img/mobilev2_l1_cifar.jpg)
54+
55+
In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.

docs/en_US/Compressor/Pruner.md

+19-1
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,9 @@ FPGMPruner prune filters with the smallest geometric median.
114114

115115
![](../../img/fpgm_fig1.png)
116116

117-
>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
117+
>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
118+
119+
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
118120

119121
### Usage
120122

@@ -154,6 +156,8 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
154156
> 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
155157
> weights are copied to the new model.
156158
159+
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference [dependency-aware mode](./DependencyAware.md).
160+
157161
### Usage
158162

159163
PyTorch code
@@ -189,6 +193,8 @@ The experiments code can be found at [examples/model_compress]( https://github.c
189193

190194
This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
191195

196+
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
197+
192198
### Usage
193199

194200
PyTorch code
@@ -200,6 +206,7 @@ pruner = L2FilterPruner(model, config_list)
200206
pruner.compress()
201207
```
202208

209+
203210
### User configuration for L2Filter Pruner
204211

205212
##### PyTorch
@@ -208,6 +215,7 @@ pruner.compress()
208215
```
209216
***
210217

218+
211219
## ActivationAPoZRankFilter Pruner
212220

213221
ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
@@ -216,6 +224,8 @@ The APoZ is defined as:
216224

217225
![](../../img/apoz.png)
218226

227+
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
228+
219229
### Usage
220230

221231
PyTorch code
@@ -234,6 +244,8 @@ Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers withi
234244

235245
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
236246

247+
248+
237249
### User configuration for ActivationAPoZRankFilter Pruner
238250

239251
##### PyTorch
@@ -247,6 +259,8 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
247259

248260
ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
249261

262+
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
263+
250264
### Usage
251265

252266
PyTorch code
@@ -265,6 +279,7 @@ Note: ActivationMeanRankFilterPruner is used to prune convolutional layers withi
265279

266280
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
267281

282+
268283
### User configuration for ActivationMeanRankFilterPruner
269284

270285
##### PyTorch
@@ -273,6 +288,7 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
273288
```
274289
***
275290

291+
276292
## TaylorFOWeightFilter Pruner
277293

278294
TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
@@ -281,6 +297,8 @@ TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based
281297
282298
![](../../img/importance_estimation_sum.png)
283299

300+
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
301+
284302
### Usage
285303

286304
PyTorch code

docs/en_US/TrainingService/RemoteMachineMode.md

+76
Original file line numberDiff line numberDiff line change
@@ -107,3 +107,79 @@ Files in `codeDir` will be uploaded to remote machines automatically. You can ru
107107
```bash
108108
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
109109
```
110+
111+
### Configure python environment
112+
113+
By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use __preCommand__ to specify a python environment on your remote machine.
114+
115+
Use `examples/trials/mnist-tfv2` as the example. Below is content of `examples/trials/mnist-tfv2/config_remote.yml`:
116+
117+
```yaml
118+
authorName: default
119+
experimentName: example_mnist
120+
trialConcurrency: 1
121+
maxExecDuration: 1h
122+
maxTrialNum: 10
123+
#choice: local, remote, pai
124+
trainingServicePlatform: remote
125+
searchSpacePath: search_space.json
126+
#choice: true, false
127+
useAnnotation: false
128+
tuner:
129+
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
130+
#SMAC (SMAC should be installed through nnictl)
131+
builtinTunerName: TPE
132+
classArgs:
133+
#choice: maximize, minimize
134+
optimize_mode: maximize
135+
trial:
136+
command: python3 mnist.py
137+
codeDir: .
138+
gpuNum: 0
139+
#machineList can be empty if the platform is local
140+
machineList:
141+
- ip: ${replace_to_your_remote_machine_ip}
142+
username: ${replace_to_your_remote_machine_username}
143+
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
144+
# Pre-command will be executed before the remote machine executes other commands.
145+
# Below is an example of specifying python environment.
146+
# If you want to execute multiple commands, please use "&&" to connect them.
147+
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
148+
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
149+
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
150+
```
151+
152+
The __preCommand__ will be executed before the remote machine executes other commands. So you can configure python environment path like this:
153+
154+
```yaml
155+
# Linux remote machine
156+
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
157+
# Windows remote machine
158+
preCommand: set path=${replace_to_python_environment_path_in_your_remote_machine};%path%
159+
```
160+
161+
Or if you want to activate the `virtualenv` environment:
162+
163+
```yaml
164+
# Linux remote machine
165+
preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
166+
# Windows remote machine
167+
preCommand: ${replace_to_absolute_path_recommended_here}\\scripts\\activate
168+
```
169+
170+
Or if you want to activate the `conda` environment:
171+
172+
```yaml
173+
# Linux remote machine
174+
preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
175+
# Windows remote machine
176+
preCommand: call activate ${replace_to_conda_env_name}
177+
```
178+
179+
If you want multiple commands to be executed, you can use `&&` to connect these commands:
180+
181+
```yaml
182+
preCommand: command1 && command2 && command3
183+
```
184+
185+
__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.

docs/en_US/TrialExample/SklearnExamples.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ It is easy to use NNI in your scikit-learn code, there are only a few steps.
6767
"kernel": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
6868
"degree": {"_type":"choice","_value":[1, 2, 3, 4]},
6969
"gamma": {"_type":"uniform","_value":[0.01, 0.1]},
70-
"coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
70+
"coef0": {"_type":"uniform","_value":[0.01, 0.1]}
7171
}
7272
```
7373

docs/en_US/Tutorial/ExperimentConfig.md

+15
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ This document describes the rules to write the config file, and provides some ex
5858
- [gpuIndices](#gpuindices-3)
5959
- [maxTrialNumPerGpu](#maxtrialnumpergpu-1)
6060
- [useActiveGpu](#useactivegpu-1)
61+
- [preCommand](#preCommand)
6162
+ [kubeflowConfig](#kubeflowconfig)
6263
- [operator](#operator)
6364
- [storage](#storage)
@@ -583,6 +584,14 @@ Optional. Bool. Default: false.
583584

584585
Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.
585586

587+
#### preCommand
588+
589+
Optional. String.
590+
591+
Specifies the pre-command that will be executed before the remote machine executes other commands. Users can configure the experimental environment on remote machine by setting __preCommand__. If there are multiple commands need to execute, use `&&` to connect them, such as `preCommand: command1 && command2 && ...`.
592+
593+
__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
594+
586595
### kubeflowConfig
587596

588597
#### operator
@@ -795,6 +804,12 @@ If run trial jobs in remote machine, users could specify the remote machine info
795804
username: test
796805
sshKeyPath: /nni/sshkey
797806
passphrase: qwert
807+
# Pre-command will be executed before the remote machine executes other commands.
808+
# Below is an example of specifying python environment.
809+
# If you want to execute multiple commands, please use "&&" to connect them.
810+
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
811+
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
812+
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
798813
```
799814

800815
### PAI mode

docs/en_US/Tutorial/Nnictl.md

+1
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,7 @@ Debug mode will disable version check function in Trialkeeper.
578578
|--path, -p| True| |the file path of nni package|
579579
|--codeDir, -c| True| |the path of codeDir for loaded experiment, this path will also put the code in the loaded experiment package|
580580
|--logDir, -l| False| |the path of logDir for loaded experiment|
581+
|--searchSpacePath, -s| True| |the path of search space file for loaded experiment, this path contains file name. Default in $codeDir/search_space.json|
581582

582583
* Examples
583584

docs/en_US/model_compression.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ For details, please refer to the following tutorials:
1717

1818
Overview <Compressor/Overview>
1919
Quick Start <Compressor/QuickStart>
20-
Pruners <Compressor/Pruner>
20+
Pruning <pruning>
2121
Quantizers <Compressor/Quantizer>
2222
Automatic Model Compression <Compressor/AutoCompression>
2323
Model Speedup <Compressor/ModelSpeedup>

docs/en_US/pruning.rst

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#################
2+
Pruning
3+
#################
4+
5+
NNI provides several pruning algorithms that support fine-grained weight pruning and structural filter pruning.
6+
It supports Tensorflow and PyTorch with unified interface.
7+
For users to prune their models, they only need to add several lines in their code.
8+
For the structural filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
9+
filter pruner will get better speed gain after the speedup.
10+
11+
For details, please refer to the following tutorials:
12+
13+
.. toctree::
14+
:maxdepth: 2
15+
16+
Pruners <Compressor/Pruner>
17+
Dependency Aware Mode <Compressor/DependencyAware>

docs/img/dependency-aware.jpg

82.1 KB
Loading

docs/img/mask_conflict.jpg

88 KB
Loading

docs/img/mobilev2_l1_cifar.jpg

966 KB
Loading

0 commit comments

Comments
 (0)