Skip to content

Commit 2f39b96

Browse files
authored
add depthwise conv op (#24)
* add depthwise conv op * typo * add figure
1 parent 8d3c45c commit 2f39b96

File tree

6 files changed

+430
-15
lines changed

6 files changed

+430
-15
lines changed

chapter_common_operators/conv.md

+3-5
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,6 @@ def conv(oc, ic, nh, nw, kh, kw, ph=0, pw=0, sh=1, sw=1):
124124
Just as what we created `get_abc` in :numref:`ch_vector_add`, we define a method to get the input and output tensors. Again, we fix the random seed so it returns the same results if calling multiple times.
125125

126126
```{.python .input}
127-
# Save to the d2ltvm package.
128127
def get_conv_data(oc, ic, n, k, p=0, s=1, constructor=None):
129128
"""Return random 3-D data tensor, 3-D kernel tenor and empty 3-D output
130129
tensor with the shapes specified by input arguments.
@@ -159,14 +158,14 @@ print(tvm.lower(sch, [X, K, Y], simple_mode=True))
159158
data, weight, out = get_conv_data(oc, ic, n, k, p, s, tvm.nd.array)
160159
mod(data, weight, out)
161160
```
161+
162162
In the last code block we also print out the pseudo code of a 2-D convolution, which is a naive 6-level nested for loop.
163163

164-
Since NumPy only has a convolution for vectors, we use MXNet's convolution operator to as the ground truth. The following code block defines the data generating function and a wrap function to call the convolution operator. Then we can feed the same tensors to compute the results in MXNet.
164+
Since NumPy only has a convolution for vectors, we use MXNet's convolution operator as the ground truth. The following code block defines the data generating function and a wrap function to call the convolution operator. Then we can feed the same tensors to compute the results in MXNet.
165165

166166
```{.python .input}
167167
import mxnet as mx
168168
169-
# Save to the d2ltvm package.
170169
def get_conv_data_mxnet(oc, ic, n, k, p, s, ctx='cpu'):
171170
ctx = getattr(mx, ctx)()
172171
data, weight, out = get_conv_data(oc, ic, n, k, p, s,
@@ -194,5 +193,4 @@ np.testing.assert_allclose(out_mx[0].asnumpy(), out.asnumpy(), atol=1e-5)
194193

195194
- We can express the computation of 2-D convolution in TVM in a fairly easy way.
196195
- Deep learning workloads normally operate 2-D convolution on 4-D data tensors and kernel tensors.
197-
- The naive matrix multiplication is a 6-level nested for loop.
198-
196+
- The naive 2-D convolution is a 6-level nested for loop.
+151
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Depthwise Convolution
2+
:label:`ch_conv`
3+
4+
Depthwise convolution is a special kind of convolution commonly used in convolutional neural networks designed for mobile and embedding applications, e.g. MobileNet :cite:`Howard.Zhu.Chen.ea.2017`.
5+
6+
```{.python .input n=1}
7+
import d2ltvm
8+
import numpy as np
9+
import tvm
10+
```
11+
12+
## Compute definition
13+
14+
Let's revisit the 2-D convolution described in :numref:`ch_conv` first. The 2-D convolution basically takes a 3-D data (note that for simplicity we set the batch size to be 1) in size `(ic, ih, iw)`, convolves it with a 4-D kernel in size `(oc, ic, kh, kw)`, and produces an output data in size `(oc, oh, ow)`. During the convolution, some padding and stride may be applied.
15+
16+
For depthwise convolution, the convolution computation itself stays the same, as illustrated in :numref:`fig_conv_strides`. It differs from the normal 2-D convolution in the way of organizing the convolution. In order to generate an output data in size `(oc, oh, ow)` from input data in size `(ic, ih, iw)`, a two-stage computation is needed. First, we process the input data with `ic` kernels, each of which convolves with the corresponding channel, to produce an intermediate data in size `(ic, oh, ow)`; then we perform the normal, but pointwise, 2-D convolution on the intermediate data in size `(ic, oh, ow)` using a 4-D kernel in size `(oc, ic, 1, 1)` to produce the output data in size `(oc, oh, ow)`, where `padding=0` and `stride=1`.
17+
18+
The computation of the second stage has been covered in :numref:`ch_conv`. This section only focuses on the computation of the first stage, which is referred to as depthwise convolution. :numref:`fig_depthwise_conv` illustrates its computation procedure.
19+
20+
![Illustration of a depthwise convolution. Each channel of the input data convolves with a dedicated kernel.](../img/depthwise-conv.svg)
21+
:label:`fig_depthwise_conv`
22+
23+
From the figure we can see that the shape of the weight is a bit different from the 2-D convolution. The weight for depthwise convolution is 3-D, while it is 4-D for 2-D convolution. Therefore, we modify the `get_conv_data` slightly to handle the generation of the data for depthwise convolution, and save it for future use.
24+
25+
```{.python .input n=8}
26+
# Save to the d2ltvm package.
27+
def get_conv_data(oc, ic, n, k, p=0, s=1, constructor=None, conv_type='direct'):
28+
"""Return random 3-D data tensor, 3-D kernel tenor and empty 3-D output
29+
tensor with the shapes specified by input arguments.
30+
31+
oc, ic : output and input channels
32+
n : input width and height
33+
k : kernel width and height
34+
p : padding size, default 0
35+
s : stride, default 1
36+
conv_type: either direct 2D or depthwise, default direct
37+
constructor : user-defined tensor constructor
38+
"""
39+
np.random.seed(0)
40+
data = np.random.normal(size=(ic, n, n)).astype('float32')
41+
ic_weight = ic
42+
if conv_type == 'depthwise':
43+
ic_weight = 1
44+
weight = np.random.normal(size=(oc, ic_weight, k, k)).astype('float32')
45+
on = d2ltvm.conv_out_size(n, k, p, s)
46+
out = np.empty((oc, on, on), dtype='float32')
47+
if constructor:
48+
data, weight, out = (constructor(x) for x in [data, weight, out])
49+
return data, weight, out
50+
```
51+
52+
Comparing to :numref:`ch_conv`, we added one argument to describe the convolution type, and make the input channel of the weight to be 1 when it is a depthwise convolution. You may wonder why we choose this dimension. The reason is to match the convention brought by the framework.
53+
54+
Then we define the depthwise convolution via TVM. Here, we reuse the `padding` and `conv_out_size` methods defined in :numref:`ch_conv`.
55+
56+
```{.python .input n=56}
57+
from d2ltvm import padding, conv_out_size
58+
59+
# Save to the d2ltvm package.
60+
def depthwise_conv(ic, nh, nw, kh, kw, ph=0, pw=0, sh=1, sw=1):
61+
"""Convolution
62+
63+
ic : number of channels for both input and output
64+
nh, nw : input width and height
65+
kh, kw : kernel width and height
66+
ph, pw : height and width padding sizes, default 0
67+
sh, sw : height and width strides, default 1
68+
"""
69+
# reduction axes
70+
rkh = tvm.reduce_axis((0, kh), name='rkh')
71+
rkw = tvm.reduce_axis((0, kw), name='rkw')
72+
# output height and weights
73+
oh = conv_out_size(nh, kh, ph, sh)
74+
ow = conv_out_size(nw, kw, pw, sw)
75+
# pad X and then compute Y
76+
X = tvm.placeholder((ic, nh, nw), name='X')
77+
K = tvm.placeholder((ic, 1, kh, kw), name='K')
78+
PaddedX = padding(X, ph, pw) if ph * pw != 0 else X
79+
Y = tvm.compute(
80+
(ic, oh, ow),
81+
lambda c, i, j: tvm.sum(
82+
(PaddedX[c, i*sh+rkh, j*sw+rkw] * K[c, 0, rkh, rkw]),
83+
axis=[rkh, rkw]), name='Y')
84+
85+
return X, K, Y, PaddedX
86+
```
87+
88+
After defining the computation of depthwise convolution, we can use the default schedule to compile and execute it as follows.
89+
We also print out the pseudo-code of it.
90+
91+
```{.python .input}
92+
ic, n, k, p, s = 256, 12, 3, 1, 1
93+
94+
X, K, Y, _ = depthwise_conv(ic, n, n, k, k, p, p, s, s)
95+
sch = tvm.create_schedule(Y.op)
96+
mod = tvm.build(sch, [X, K, Y])
97+
print(tvm.lower(sch, [X, K, Y], simple_mode=True))
98+
99+
data, weight, out = get_conv_data(ic, ic, n, k, p, s,
100+
constructor=tvm.nd.array,
101+
conv_type='depthwise')
102+
mod(data, weight, out)
103+
```
104+
105+
## Depthwise Convolution in General
106+
107+
You may wonder why we want to replace a typical 2-D convolution into a more complicated, two-stage depthwise plus pointwise 2-D convolution. This book doesn't discuss about the choice of algorithms, but from the computational perspective, the main reason is to reduce the number of computation it requires. Assuming that the input data is in size `[ic, ih, iw]`, the kernel is in size `[kh, kw]`, and the output data is in size `[oc, oh, ow]`, a 2-D convolution takes $2 \times ic \times oh \times ow \times kh \times kw \times oc$ FLOPs, while a depthwise plus pointwise 2-D convolution takes $2 \times ic \times oh \times ow \times (kh \times kw + oc)$ FLOPs. It is easy to see that the 2-D convolution normally takes more FLOPs than depthwise plus pointwise 2-D convolution, especially when the kernel size and/or the number of output channels are large. Taking the above example where $ic=256, oh=ow=12, kh=kw=3$, if we set $oc=512$, the total FLOPs of a 2-D convolution is $339,738,624$, while the depthwise plus pointwise convolution is $38,412,288$, almost one order of magnitude smaller, are much suitable for mobile and embedded applications.
108+
109+
In the MobileNet paper :cite:`Howard.Zhu.Chen.ea.2017`, the depthwise convolution was described as a separable convolution which separates the channels for convolution. From another aspect, a depthwise convolution can be treated as a special kind of grouped convolution. A `G`-grouped convolution divide the channels into `G` groups and do the convolution group by group independently. This was first introduced in AlexNet to save memory. We can easily figure out that when the number of groups equals the number of channels, a grouped convolution is reduced to a depthwise convolution.
110+
111+
In fact, MXNet uses the same API `mx.nd.Convolution` to process depthwise convolution by specifying the number of groups, as we will show in the next code block.
112+
113+
In addition, a depthwise convolution can be generalized in other ways. For example, we can specify a `multiplier` to increase the number of channels for the output of depthwise convolution, which we are cover in this section for simplicity.
114+
115+
## Comparing to Baseline
116+
117+
We use MXNet’s convolution operator as the ground truth to verify the correctness of our depthwise convolution. Before that, we will need to create data. As for TVM, we modify the `get_conv_data_mxnet` defined in :numref:`ch_conv` to take `conv_type`.
118+
119+
```{.python .input n=10}
120+
import mxnet as mx
121+
122+
# Save to the d2ltvm package.
123+
def get_conv_data_mxnet(oc, ic, n, k, p, s, ctx='cpu', conv_type='direct'):
124+
ctx = getattr(mx, ctx)()
125+
data, weight, out = get_conv_data(oc, ic, n, k, p, s,
126+
constructor=lambda x: mx.nd.array(x, ctx=ctx),
127+
conv_type=conv_type)
128+
data, out = data.expand_dims(axis=0), out.expand_dims(axis=0)
129+
bias = mx.nd.zeros(out.shape[1], ctx=ctx)
130+
return data, weight, bias, out
131+
```
132+
133+
Then we do the computation and compare with the TVM result. Note that the weight size is `[oc, 1, kh, kw]` as the number of groups equals the number of channels, i.e. each kernel only corresponds to one channel of the data, as what we are doing in TVM.
134+
135+
```{.python .input}
136+
# Save to the d2ltvm package.
137+
def depthwise_conv_mxnet(data, weight, bias, out, k, p, s):
138+
mx.nd.Convolution(data, weight, bias, kernel=(k,k), stride=(s,s),
139+
pad=(p,p), num_filter=out.shape[1],
140+
out=out, num_group=weight.shape[0])
141+
142+
data, weight, bias, out_mx = get_conv_data_mxnet(ic, ic, n, k, p, s, conv_type='depthwise')
143+
depthwise_conv_mxnet(data, weight, bias, out_mx, k, p, s)
144+
145+
np.testing.assert_allclose(out_mx[0].asnumpy(), out.asnumpy(), atol=1e-5)
146+
```
147+
148+
## Summary
149+
150+
- Depthwise convolution, together with pointwise convolution, can save a lot of computation and memory compared to normal 2-D convolution.
151+
- Depthwise convolution takes kernels in 3-D, while normal 2-D convolution takes kernels in 4-D.

chapter_common_operators/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ In last chapter, we went over how to implement the basic expressions/operators u
88
broadcast_add
99
matmul
1010
conv
11+
depthwise_conv
1112
```

d2ltvm.bib

+10
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,13 @@ @InProceedings{ Ragan-Kelley.Barnes.Adams.ea.2013
9999
numpages = {12},
100100
publisher = {ACM}
101101
}
102+
103+
@Article{ Howard.Zhu.Chen.ea.2017,
104+
title = {Mobilenets: Efficient convolutional neural networks for
105+
mobile vision applications},
106+
author = {Howard, Andrew G and Zhu, Menglong and Chen, Bo and
107+
Kalenichenko, Dmitry and Wang, Weijun and Weyand, Tobias
108+
and Andreetto, Marco and Adam, Hartwig},
109+
journal = {arXiv preprint arXiv:1704.04861},
110+
year = {2017}
111+
}

d2ltvm/d2ltvm.py

+51-10
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,13 @@ def conv(oc, ic, nh, nw, kh, kw, ph=0, pw=0, sh=1, sw=1):
160160

161161

162162
# Defined in file: ./chapter_common_operators/conv.md
163-
def get_conv_data(oc, ic, n, k, p=0, s=1, constructor=None):
163+
def conv_mxnet(data, weight, bias, out, k, p, s):
164+
mx.nd.Convolution(data, weight, bias, kernel=(k,k), stride=(s,s),
165+
pad=(p,p), num_filter=out.shape[1], out=out)
166+
167+
168+
# Defined in file: ./chapter_common_operators/depthwise_conv.md
169+
def get_conv_data(oc, ic, n, k, p=0, s=1, constructor=None, conv_type='direct'):
164170
"""Return random 3-D data tensor, 3-D kernel tenor and empty 3-D output
165171
tensor with the shapes specified by input arguments.
166172
@@ -169,32 +175,67 @@ def get_conv_data(oc, ic, n, k, p=0, s=1, constructor=None):
169175
k : kernel width and height
170176
p : padding size, default 0
171177
s : stride, default 1
178+
conv_type: either direct 2D or depthwise, default direct
172179
constructor : user-defined tensor constructor
173180
"""
174181
np.random.seed(0)
175182
data = np.random.normal(size=(ic, n, n)).astype('float32')
176-
weight = np.random.normal(size=(oc, ic, k, k)).astype('float32')
177-
on = conv_out_size(n, k, p, s)
183+
ic_weight = ic
184+
if conv_type == 'depthwise':
185+
ic_weight = 1
186+
weight = np.random.normal(size=(oc, ic_weight, k, k)).astype('float32')
187+
on = d2ltvm.conv_out_size(n, k, p, s)
178188
out = np.empty((oc, on, on), dtype='float32')
179189
if constructor:
180190
data, weight, out = (constructor(x) for x in [data, weight, out])
181191
return data, weight, out
182192

183193

184-
# Defined in file: ./chapter_common_operators/conv.md
185-
def get_conv_data_mxnet(oc, ic, n, k, p, s, ctx='cpu'):
194+
# Defined in file: ./chapter_common_operators/depthwise_conv.md
195+
def depthwise_conv(ic, nh, nw, kh, kw, ph=0, pw=0, sh=1, sw=1):
196+
"""Convolution
197+
198+
ic : number of channels for both input and output
199+
nh, nw : input width and height
200+
kh, kw : kernel width and height
201+
ph, pw : height and width padding sizes, default 0
202+
sh, sw : height and width strides, default 1
203+
"""
204+
# reduction axes
205+
rkh = tvm.reduce_axis((0, kh), name='rkh')
206+
rkw = tvm.reduce_axis((0, kw), name='rkw')
207+
# output height and weights
208+
oh = conv_out_size(nh, kh, ph, sh)
209+
ow = conv_out_size(nw, kw, pw, sw)
210+
# pad X and then compute Y
211+
X = tvm.placeholder((ic, nh, nw), name='X')
212+
K = tvm.placeholder((ic, 1, kh, kw), name='K')
213+
PaddedX = padding(X, ph, pw) if ph * pw != 0 else X
214+
Y = tvm.compute(
215+
(ic, oh, ow),
216+
lambda c, i, j: tvm.sum(
217+
(PaddedX[c, i*sh+rkh, j*sw+rkw] * K[c, 0, rkh, rkw]),
218+
axis=[rkh, rkw]), name='Y')
219+
220+
return X, K, Y, PaddedX
221+
222+
223+
# Defined in file: ./chapter_common_operators/depthwise_conv.md
224+
def get_conv_data_mxnet(oc, ic, n, k, p, s, ctx='cpu', conv_type='direct'):
186225
ctx = getattr(mx, ctx)()
187-
data, weight, out = get_conv_data(oc, ic, n, k, p, s,
188-
lambda x: mx.nd.array(x, ctx=ctx))
226+
data, weight, out = get_conv_data(oc, ic, n, k, p, s,
227+
constructor=lambda x: mx.nd.array(x, ctx=ctx),
228+
conv_type=conv_type)
189229
data, out = data.expand_dims(axis=0), out.expand_dims(axis=0)
190230
bias = mx.nd.zeros(out.shape[1], ctx=ctx)
191231
return data, weight, bias, out
192232

193233

194-
# Defined in file: ./chapter_common_operators/conv.md
195-
def conv_mxnet(data, weight, bias, out, k, p, s):
234+
# Defined in file: ./chapter_common_operators/depthwise_conv.md
235+
def depthwise_conv_mxnet(data, weight, bias, out, k, p, s):
196236
mx.nd.Convolution(data, weight, bias, kernel=(k,k), stride=(s,s),
197-
pad=(p,p), num_filter=out.shape[1], out=out)
237+
pad=(p,p), num_filter=out.shape[1],
238+
out=out, num_group=weight.shape[0])
198239

199240

200241
# Defined in file: ./chapter_cpu_schedules/call_overhead.md

0 commit comments

Comments
 (0)