Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Model build error: Check failed: i >= 0 && i < ndim(): index = -2 must be in range [0, -1) #14751

Open
atopion opened this issue Apr 20, 2019 · 12 comments
Labels

Comments

@atopion
Copy link

atopion commented Apr 20, 2019

Description

On Windows: Building of keras models fails on adding Dense layers with this error message. This happens regardless if package is build from source or downloaded via pip, with or without cuda.

I first encountered this error as I tried to start https://github.com/roatienza/Deep-Learning-Experiments/blob/master/Experiments/Tensorflow/GAN/dcgan_mnist.py with mxnet instead of tensorflow backend. Afterwards I tried a couple examples on https://github.com/awslabs/keras-apache-mxnet/tree/master/examples which all failed with this error.

I did not find this error on any other issue or in any forum, so I opened a new issue.

Environment info (Required)

Version      : 3.7.2
Compiler     : MSC v.1916 64 bit (AMD64)
Build        : ('tags/v3.7.2:9a3ffc0492', 'Dec 23 2018 23:09:28')
Arch         : ('64bit', 'WindowsPE')
------------Pip Info-----------
Version      : 19.0.3
Directory    : E:\data\program\python\lib\site-packages\pip
----------MXNet Info-----------
Version      : 1.4.0
Directory    : E:\data\program\python\lib\site-packages\mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Windows-10-10.0.17763-SP0
system       : Windows
node         : W7
release      : 10
version      : 10.0.17763
----------Hardware Info----------
machine      : AMD64
processor    : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
Name
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0312 sec, LOAD: 0.7029 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0625 sec, LOAD: 1.0311 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0625 sec, LOAD: 0.9529 sec.
Error open FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>, DNS finished in 0.04686379432678223 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0312 sec, LOAD: 4.7601 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0469 sec, LOAD: 0.2656 sec.

Package used (Python/R/Scala/Julia): Python 3.7.2

Build info

Compiler (gcc/clang/mingw/visual studio): visual studio

MXNet commit hash: dc48cd2

Build config: Build command:

msbuild mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount

Error Message:

Using MXNet backend
corpus length: 600893
total chars: 57
nb sequences: 200285
Vectorization...
Build model...
Traceback (most recent call last):
  File ".\lstm_text_generation.py", line 60, in <module>
    model.add(Dense(len(chars), activation='softmax'))
  File "E:\data\program\python\lib\site-packages\keras\engine\sequential.py", line 181, in add
    output_tensor = layer(self.outputs[0])
  File "E:\data\program\python\lib\site-packages\keras\engine\base_layer.py", line 470, in __call__
    output = self.call(inputs, **kwargs)
  File "E:\data\program\python\lib\site-packages\keras\layers\core.py", line 893, in call
    output = K.bias_add(output, self.bias, data_format='channels_last')
  File "E:\data\program\python\lib\site-packages\keras\backend\mxnet_backend.py", line 96, in func_wrapper
    train_symbol = func(*args, **kwargs)
  File "E:\data\program\python\lib\site-packages\keras\backend\mxnet_backend.py", line 3986, in bias_add
    x_dim = ndim(x)
  File "E:\data\program\python\lib\site-packages\keras\backend\mxnet_backend.py", line 537, in ndim
    shape = x.shape
  File "E:\data\program\python\lib\site-packages\keras\backend\mxnet_backend.py", line 4399, in shape
    return self._get_shape()
  File "E:\data\program\python\lib\site-packages\keras\backend\mxnet_backend.py", line 4408, in _get_shape
    _, out_shape, _ = self.symbol.infer_shape_partial()
  File "E:\data\program\python\lib\site-packages\mxnet\symbol\symbol.py", line 1068, in infer_shape_partial
    return self._infer_shape_impl(True, *args, **kwargs)
  File "E:\data\program\python\lib\site-packages\mxnet\symbol\symbol.py", line 1126, in _infer_shape_impl
    ctypes.byref(complete)))
  File "E:\data\program\python\lib\site-packages\mxnet\base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator _foreach0: Error in operator dot7: [12:53:29] e:\data\program\mxnet\incubator-mxnet\include\mxnet\tuple.h:202: Check failed: i >= 0 && i < ndim(): index = -2 must be in range [0, -1)

Steps to reproduce

git clone https://github.com/awslabs/keras-apache-mxnet.git
cd examples
python ./lstm_text_generation.py

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Build

@frankfliu
Copy link
Contributor

@mxnet-label-bot add [keras]

@lanking520
Copy link
Member

@roywei

@roywei
Copy link
Member

roywei commented Apr 23, 2019

Hi @atopion, thanks for the issue, could you share the following to help us debug?

  1. what keras-mxnet version are you using?
  2. which example is failing.
  3. your keras.json config (located at ~/.keras/keras.json)

I'm running examples/lstm_text_generation.py seems fine.

keras-mxnet                        2.2.4.1       
mxnet-cu90                         1.5.0b20190412
python lstm_text_generation.py 
Using MXNet backend
Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
606208/600901 [==============================] - 0s 1us/step
614400/600901 [==============================] - 0s 1us/step
corpus length: 600893
total chars: 57
nb sequences: 200285
Vectorization...
Build model...
Epoch 1/60
/usr/local/lib/python2.7/dist-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.0078125). Is this intended?
  force_init=force_init)
131456/200285 [==================>...........] - ETA: 33s - loss: 2.2159

@atopion
Copy link
Author

atopion commented Apr 23, 2019

Hello,
of course, I'm using keras-mxnet 2.2.4.1 and my keras.json looks like this:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "mxnet",
    "image_data_format": "channels_first"
}

Sofar, I've tried the examples:

  • lstm_text_generation.py
  • deep_dream.py
  • mnist_mlp.py
  • cifar10_cnn.py
  • mnist_acgan.py

which all failed.

@roywei : In your answer I've seen you're using python 2.7. Could this maybe be a reason?
Please let me know if you need anything else.

atopion

@roywei
Copy link
Member

roywei commented Apr 23, 2019

Hi @atopion, on my side both 2.7 and 3.6 works, tested on mac and linux. Now I will find a windows machine and test it. May due to OS.

@roywei
Copy link
Member

roywei commented Apr 24, 2019

Hi @atopion, I'm still trying to find a windows machine to test on. In the mean time, could you try complete remove and re-install mxnet again?
I see you reported mxnet version 1.4.0

----------MXNet Info-----------
Version      : 1.4.0

but from the commit hash you reported and error message it's the latest master.
to uninstall, do pip uninstall to remove the pip package and make clean to remove the mxnet lib build from source.

I have verified the following works fine on linux:
mxnet 1.4.0 pip install mxnet-cu90
or
mxnet nightly(latest master) pip install mxnet-cu90 --pre
keras-mxnet 2.2.4.1: pip install keras-mxnet

pip3 and python3 also works

If you still face thi error, please let me know the mxnet version you used. if pip package, package name( mxnet-mkl, mxnet-cu90, mxnet-cu90mkl, ect), if build from source, what are the build flags? refer to install guide
Thanks!

@atopion
Copy link
Author

atopion commented Apr 25, 2019

Hello @roywei,
I'm very sorry for the mixup, I was actually using version 1.5.0 to build from source. My build flags are

cmake -G "Visual Studio 15 2017 Win64" -T cuda=9.2,host=x64 -DUSE_CUDA=1 -DUSE_CUDNN=1 -DUSE_NVRTC=1 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_BLAS=open -DUSE_LAPACK=1 -DUSE_DIST_KVSTORE=0 -DCUDA_ARCH_LIST=All -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -DCUDA_TOOLSET='9.2' -D CUDNN_INCLUDE='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include' -D CUDNN_LIBRARY='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\lib\x64\cudnn.lib' -D OpenCV_DIR="E:\data\program\opencv\build" E:\data\program\mxnet

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\MSBuild.exe" mxnet.sln /p:Configuration=Release;Platform=x64 /maxcpucount

Following your suggestion I uninstalled it and tried out installing different versions via pip, here are the results:

mxnet-1.4.0 :                  works
mxnet-1.5.0b20190425:          Same error as above
mxnet-cu92-1.4.0.post0:        works
mxnet-cu92-1.5.0b20190425:     Same error as above
mxnet-mkl-1.4.0.post0:         works
mxnet-mkl-1.5.0b20190409:      works
mxnet-cu92mkl-1.4.0.post0:     works
mxnet-cu92mkl-1.5.0b20190409:  works

So to me it seems like an error in one of the latest commits, which does not occur when mkl is used (maybe a wrong api call to OpenBLAS or so).
To verify this, I will clone release 1.4.0 (hash a03d59e) and build that from source.

atopion

@roywei
Copy link
Member

roywei commented Apr 25, 2019

@atopion thanks a lot for the detailed info, I m able to reproduce now, this is introduced in PR: #14661 by changes on include/mxnet/tuple.h. Please use pip or source before this PR while we fix it.

Thanks!

@atopion
Copy link
Author

atopion commented May 1, 2019

Hello,
sorry about the delay. I have now successfully compiled release 1.4.0 from source and it works. I will use this until the problem is fixed.

Should I close the issue?
atopion

@feixiangdekaka
Copy link

Hi @atopion, I'm still trying to find a windows machine to test on. In the mean time, could you try complete remove and re-install mxnet again?
I see you reported mxnet version 1.4.0

----------MXNet Info-----------
Version      : 1.4.0

but from the commit hash you reported and error message it's the latest master.
to uninstall, do pip uninstall to remove the pip package and make clean to remove the mxnet lib build from source.

I have verified the following works fine on linux:
mxnet 1.4.0 pip install mxnet-cu90
or
mxnet nightly(latest master) pip install mxnet-cu90 --pre
keras-mxnet 2.2.4.1: pip install keras-mxnet

pip3 and python3 also works

If you still face thi error, please let me know the mxnet version you used. if pip package, package name( mxnet-mkl, mxnet-cu90, mxnet-cu90mkl, ect), if build from source, what are the build flags? refer to install guide
Thanks!

raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:37:35] include/mxnet/tuple.h:202: Check failed: i >= 0 && i < ndim(): index = 0 must be in range [0, -1)

@KellenSunderland
Copy link
Contributor

We noticed that we are getting this error when binding a regress output that isn't explicitly labelled. We added regress_label during bind and it fixed the error for us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants