You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
When using a MKL build, if the tensor is not 2d or 4d, the default CPU implementation is used, which in some cases compared to MKLDNN is extremely inefficient (for example 20x in case of concat operator). Examples are convolution and concat operators.
Environment info (Required)
----------Python Info----------
Version : 3.4.5
Compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1)
Build : ('default', 'Jul 2 2016 17:47:47')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 18.0
Directory : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/pip
----------MXNet Info-----------
Version : 1.3.0
Directory : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/mxnet
Commit Hash : f5b95b090815e879b57dca233604dcb3f1df967a
----------System Info----------
Platform : Linux-4.9.93-41.60.amzn1.x86_64-x86_64-with-glibc2.2.5
system : Linux
node : ip-172-31-73-235
release : 4.9.93-41.60.amzn1.x86_64
version : #1 SMP Fri Apr 13 21:58:27 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2698.120
BogoMIPS: 4600.11
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-7
----------Network Test----------
Setting timeout: 10
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0150 sec, LOAD: 0.3634 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0028 sec, LOAD: 0.0405 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0768 sec, LOAD: 0.5932 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0466 sec, LOAD: 0.3405 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0031 sec, LOAD: 0.1442 sec.
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0032 sec, LOAD: 0.4106 sec.
I'm using Python package.
Minimum reproducible example
def test(make_4d):
ctx = mx.cpu()
num_iter = 1000
start = time()
for i in range(num_iter):
extra_dim = (1, ) if make_4d else tuple()
cdim = 3 if make_4d else 2
a_shape = extra_dim + (1, 512, 120 * 120)
b_shape = extra_dim + (1, 512, 1)
a = nd.empty(a_shape, ctx=ctx)
b = nd.empty(b_shape, ctx=ctx)
c = nd.concat(a, b, dim=cdim)
if make_4d:
c = c.reshape(c.shape[1:])
nd.waitall()
print('\telapsed: {:.2f}'.format(time() - start))
if __name__ == '__main__':
print("4D Test")
test(True)
print("3D Test")
test(False)
Output:
4D Test
elapsed: 2.18
3D Test
elapsed: 39.02
What have you tried to solve it?
Looking at the implementation, the reason is that SupportMKLDNNConcat() returns false if the input tensor is not 2d or 4d.
The text was updated successfully, but these errors were encountered:
Description
When using a MKL build, if the tensor is not 2d or 4d, the default CPU implementation is used, which in some cases compared to MKLDNN is extremely inefficient (for example 20x in case of
concat
operator). Examples are convolution and concat operators.Environment info (Required)
I'm using Python package.
Minimum reproducible example
Output:
What have you tried to solve it?
Looking at the implementation, the reason is that
SupportMKLDNNConcat()
returns false if the input tensor is not 2d or 4d.The text was updated successfully, but these errors were encountered: