WIP, conv_bn fuse example using paddlefx #33

jzhang533 · 2023-03-31T03:37:42Z

this is ported from https://pytorch.org/tutorials/intermediate/fx_conv_bn_fuser.html, but still have some critical issues need to solve.

the fused model is slower than the unfused model, which is unexpected, need to refactor/reimplement python code generation of paddlefx
result of fused resnet18 is slight different to unfused resnet18
iterating over fx_model.graph.nodes will cause endless loop.

See TODO in examples/conv_bn_fuse.py for details.

SigureMo · 2023-04-01T06:43:39Z

iterating over fx_model.graph.nodes will cause endless loop.

这个已解决，之前没考虑到循环时 erase 当前 node 的情况 😂

Asthestarsfalll · 2023-04-08T13:37:22Z

似乎不是codegen的原因，应该是原本的rn18中的conv并不包含bias，而fuse之后需要为conv添加bias，导致了速度变慢，使用以下代码为original rn18中的conv添加bias，可以观察到三者速度基本一致：

for name, module in rn18.named_sublayers():
    if isinstance(module, nn.Conv2D):
        module.bias = paddle.zeros([module.weight.shape[0]])

SigureMo · 2023-04-09T07:44:53Z

似乎不是codegen的原因，应该是原本的rn18中的conv并不包含bias，而fuse之后需要为conv添加bias，导致了速度变慢，使用以下代码为original rn18中的conv添加bias，可以观察到三者速度基本一致：

对的，这个问题之前就发现了，我们可以加上如下特判进行「优化」：

if paddle.allclose(conv_b_param, paddle.zeros_like(conv_b_param)):
    conv_b_param = None

对于未训练的 resnet18 确实可以起到加速作用，因为从 BN 层获得的参数是 0，附加的 bias 可被优化为 None

Fused time:  2.448992967605591
Unfused time:  2.5072388648986816
Traced time:  2.4950180053710938

但对于训练过的 resnet18（比如开启 pretrained=True），基本不可能会有附加 bias 是 0 的情况了，则会严重降低速度

Fused time:  3.230083703994751
Unfused time:  2.48759388923645
Traced time:  2.4984631538391113

因此这并不能算得上优化，应该找到加上 bias 会严重拖慢速度的原因

jzhang533 · 2023-04-10T09:56:29Z

因此这并不能算得上优化，应该找到加上 bias 会严重拖慢速度的原因

我想原因应该是paddle的Conv2D的实现，在有bias和没有bias的情况下，性能差异很大。
因为paddle的Conv2D的实现是先用C++ kernel做pre bias的部分，然后再用另一个kernel把bias加上去。
pytorch的Conv2D没有这个问题，是因为不管是否有bias，都是调用同一个C++ kernel。

no bias conv + batch_norm 因为fuse能带来的一点性能提升都因为fuse后的conv需要有bias，导致反而变慢了。

下面的代码运行一下，就能很容易的看到差别：

import paddle
import paddle.nn.functional as F
import time

class MyNet(paddle.nn.Layer):
    def __init__(self, bias=False):
        super(MyNet, self).__init__()

        self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), bias_attr=bias)

    def forward(self, x):
        x = self.conv1(x)
        return x

bias_model = MyNet(bias=True)
no_bias_model = MyNet(bias=False)

inp = paddle.rand((128, 3, 224, 224))

def benchmark(model, iters=1000):
    for _ in range(50):
        model(inp)
        
#    paddle.device.cuda.synchronize()
    begin = time.time()
    for _ in range(iters):
        model(inp)
    
#    paddle.device.cuda.synchronize()
    return str(time.time() - begin)

print("no bias time: ", benchmark(no_bias_model))
print("bias time: ", benchmark(bias_model))

on V100

no bias time: 1.829711675643921
bias time: 3.8552277088165283

jzhang533 · 2023-04-18T09:48:02Z

因为paddle的Conv2D的实现的问题，短期内，为了能展示paddlefx的fuse的能力，也许可以构造一个含有bias的conv + bn的网络，来让这个PR能先合入。

Asthestarsfalll · 2023-04-20T12:25:12Z

一般来说bn之前的conv都不会设置bias吧，或许可以尝试fuse RepVGG这样的网络

jzhang533 and others added 2 commits March 31, 2023 11:32

WIP, conv_bn fuse example using paddlefx

85c06e3

fix endless loop

b8fef07

jzhang533 mentioned this pull request Apr 6, 2023

[WIP] TODO list #2

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP, conv_bn fuse example using paddlefx #33

WIP, conv_bn fuse example using paddlefx #33

jzhang533 commented Mar 31, 2023

SigureMo commented Apr 1, 2023

Asthestarsfalll commented Apr 8, 2023

SigureMo commented Apr 9, 2023

jzhang533 commented Apr 10, 2023 •

edited

Loading

jzhang533 commented Apr 18, 2023

Asthestarsfalll commented Apr 20, 2023

WIP, conv_bn fuse example using paddlefx #33

Are you sure you want to change the base?

WIP, conv_bn fuse example using paddlefx #33

Conversation

jzhang533 commented Mar 31, 2023

SigureMo commented Apr 1, 2023

Asthestarsfalll commented Apr 8, 2023

SigureMo commented Apr 9, 2023

jzhang533 commented Apr 10, 2023 • edited Loading

jzhang533 commented Apr 18, 2023

Asthestarsfalll commented Apr 20, 2023

jzhang533 commented Apr 10, 2023 •

edited

Loading