Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Bug] set_np_compat(shape) crash the shape inference #15096

Closed
jermainewang opened this issue May 29, 2019 · 5 comments
Closed

[Bug] set_np_compat(shape) crash the shape inference #15096

jermainewang opened this issue May 29, 2019 · 5 comments
Assignees

Comments

@jermainewang
Copy link
Contributor

Description

We try to find a latest MX nightly build that works with DGL (v1.4 is not an option due to the bug in DLPack). Currently, we are using 1.5.0b20190523. The shape inference will fail if mx.set_np_compat is used. We also tried latest nightly build (requires to change mx.set_np_compat to mx.set_np_shape). The bug still exists.

Environment info (Required)

----------Python Info----------
Version      : 3.7.2
Compiler     : GCC 8.2.1 20181127
Build        : ('default', 'Jan 10 2019 23:51:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : /usr/lib/python3.7/site-packages/pip
----------MXNet Info-----------
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
Version      : 1.5.0
Directory    : /home/jermaine/.local/lib/python3.7/site-packages/mxnet
Commit Hash   : 66aa983e4dff90c28c356d66423565c69417d3c1
----------System Info----------
Platform     : Linux-4.20.7-arch1-1-ARCH-x86_64-with-arch
system       : Linux
node         : wmj-broadway
release      : 4.20.7-arch1-1-ARCH
version      : #1 SMP PREEMPT Wed Feb 6 18:42:40 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
Stepping:            4
CPU MHz:             1506.883
CPU max MHz:         3900.0000
CPU min MHz:         1200.0000
BogoMIPS:            7384.58
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            10240K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0009 sec, LOAD: 0.7213 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0437 sec, LOAD: 0.3932 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0398 sec, LOAD: 0.3987 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0142 sec, LOAD: 0.6286 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0027 sec, LOAD: 0.1556 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0053 sec, LOAD: 0.1012 sec.

Package used (Python/R/Scala/Julia): python

Error Message:

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
(2, 2016)
infer_shape error. Arguments:
  data: (2, 2016)
Traceback (most recent call last):
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 910, in forward
    params = {i: j.data(ctx) for i, j in self._reg_params.items()}
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 910, in <dictcomp>
    params = {i: j.data(ctx) for i, j in self._reg_params.items()}
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 494, in data
    return self._check_and_get(self._data, ctx)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 208, in _check_and_get
    "num_features, etc., for network layers."%(self.name))
mxnet.gluon.parameter.DeferredInitializationError: Parameter 'dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 789, in _deferred_infer_shape
    self.infer_shape(*args)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 862, in infer_shape
    self._infer_attrs('infer_shape', 'shape', *args)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 851, in _infer_attrs
    **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1075, in infer_shape
    res = self._infer_shape_impl(False, *args, **kwargs)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1209, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/base.py", line 254, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator dense0_fwd: Shape inconsistent, Provided = [16,0], inferred shape=(16,2016)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "t.py", line 16, in <module>
    print(foo(z).shape)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "t.py", line 11, in forward
    return self.dense(x)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 912, in forward
    self._deferred_infer_shape(x, *args)
  File "/home/jermaine/.local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 793, in _deferred_infer_shape
    raise ValueError(error_msg)
ValueError: Deferred initialization failed because shape cannot be inferred. Error in operator dense0_fwd: Shape inconsistent, Provided = [16,0], inferred shape=(16,2016)

Minimum reproducible example

import mxnet as mx
import mxnet.gluon as gluon

mx.set_np_compat(True)

class Foo(gluon.Block):
    def __init__(self, **kwargs):
        super(Foo, self).__init__(**kwargs)
        self.dense = gluon.nn.Dense(16)
    def forward(self, x):
        return self.dense(x)
z = mx.nd.zeros((2,2016))
print(z.shape)
foo = Foo()
foo.initialize()
print(foo(z).shape)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Run the example

What have you tried to solve it?

  1. No solution yet.
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@jermainewang
Copy link
Contributor Author

@zheng-da @szha

@jermainewang
Copy link
Contributor Author

jermainewang commented May 29, 2019

Removing mx.set_np_compat fix the bug. Here is the correct output:

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
(2, 2016)
(2, 16)

However, in DGL, set_np_compat is required to enable zero-length tensor.

@frankfliu
Copy link
Contributor

@mxnet-label-bot add [bug, Numpy]

@reminisce
Copy link
Contributor

Fixed by #15097

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants