Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MKLDNN not used for 3d tensors #11909

Closed
safrooze opened this issue Jul 27, 2018 · 2 comments
Closed

MKLDNN not used for 3d tensors #11909

safrooze opened this issue Jul 27, 2018 · 2 comments

Comments

@safrooze
Copy link
Contributor

Description

When using a MKL build, if the tensor is not 2d or 4d, the default CPU implementation is used, which in some cases compared to MKLDNN is extremely inefficient (for example 20x in case of concat operator). Examples are convolution and concat operators.

Environment info (Required)

----------Python Info----------
Version      : 3.4.5
Compiler     : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)
Build        : ('default', 'Jul  2 2016 17:47:47')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 18.0
Directory    : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.0
Directory    : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/mxnet
Commit Hash   : f5b95b090815e879b57dca233604dcb3f1df967a
----------System Info----------
Platform     : Linux-4.9.93-41.60.amzn1.x86_64-x86_64-with-glibc2.2.5
system       : Linux
node         : ip-172-31-73-235
release      : 4.9.93-41.60.amzn1.x86_64
version      : #1 SMP Fri Apr 13 21:58:27 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2698.120
BogoMIPS:              4600.11
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7
----------Network Test----------
Setting timeout: 10
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0150 sec, LOAD: 0.3634 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0028 sec, LOAD: 0.0405 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0768 sec, LOAD: 0.5932 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0466 sec, LOAD: 0.3405 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0031 sec, LOAD: 0.1442 sec.
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0032 sec, LOAD: 0.4106 sec.

I'm using Python package.

Minimum reproducible example

def test(make_4d):
    ctx = mx.cpu()
    num_iter = 1000
    start = time()
    for i in range(num_iter):
        extra_dim = (1, ) if make_4d else tuple()
        cdim = 3 if make_4d else 2

        a_shape = extra_dim + (1, 512, 120 * 120)
        b_shape = extra_dim + (1, 512, 1)

        a = nd.empty(a_shape, ctx=ctx)
        b = nd.empty(b_shape, ctx=ctx)
        c = nd.concat(a, b, dim=cdim)
        if make_4d:
            c = c.reshape(c.shape[1:])
    nd.waitall()
    print('\telapsed: {:.2f}'.format(time() - start))

if __name__ == '__main__':
    print("4D Test")
    test(True)
    print("3D Test")
    test(False)

Output:

4D Test
	elapsed: 2.18
3D Test
	elapsed: 39.02

What have you tried to solve it?

Looking at the implementation, the reason is that SupportMKLDNNConcat() returns false if the input tensor is not 2d or 4d.

@TaoLv
Copy link
Member

TaoLv commented Jul 27, 2018

Similar issue here: #11906
Besides unit tests, could you help to elaborate which models are using these 3D operators and what do the inputs look like?

@sandeep-krishnamurthy
Copy link
Contributor

@safrooze - Closing this issue in favor of feature tracking issue - #11906

Please do add more details on examples/models on that issue.

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants