Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

changed constructor args for TShape #15601

Merged
merged 1 commit into from
Jul 20, 2019
Merged

Conversation

samskalicky
Copy link
Contributor

@samskalicky samskalicky commented Jul 19, 2019

Description

Fix bug in TShape upgrades for constructor in warpctc

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Fixes #15612

@junrushao1994

@samskalicky
Copy link
Contributor Author

got the following failures:

======================================================================

FAIL: test_operator_gpu.test_convolution_independent_gradients

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest

    self.test(*self.arg)

  File "C:\Python37\lib\site-packages\nose\util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 177, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1989, in test_convolution_independent_gradients

    grad2[var_name].asnumpy(), rtol=rtol, atol=atol)

  File "C:\Python37\lib\site-packages\numpy\testing\_private\utils.py", line 1501, in assert_allclose

    verbose=verbose, header=header, equal_nan=equal_nan)

  File "C:\Python37\lib\site-packages\numpy\testing\_private\utils.py", line 827, in assert_array_compare

    raise AssertionError(msg)

AssertionError: 

Not equal to tolerance rtol=0.01, atol=0.1



Mismatch: 0.319%

Max absolute difference: 0.90769005

Max relative difference: 0.08535597

 x: array([[[[ 6.115280e+01, -5.233296e+01,  2.049723e+02, ...,

          -9.650239e+01, -7.207094e+01, -1.415717e+01],

         [ 3.148866e+01,  3.586515e+02,  8.097604e+01, ...,...

 y: array([[[[ 6.080400e+01, -5.231017e+01,  2.049781e+02, ...,

          -9.649588e+01, -7.206804e+01, -1.417635e+01],

         [ 3.168588e+01,  3.585852e+02,  8.099965e+01, ...,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=136899186 to reproduce.

--------------------- >> end captured logging << ---------------------


test_operator_gpu.test_kernel_error_checking ... 

[

06:19:48] C:\jenkins_slave\workspace\build-gpu\src\base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7402, which is older than the oldest version tested by CI 

(7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.



Process SpawnProcess-7:

Traceback (most recent call last):

  File "C:\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap

    self.run()

  File "C:\Python37\lib\multiprocessing\process.py", line 99, in run

    self._target(*self._args, **self._kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\test_operator_gpu.py", line 2032, in kernel_error_check_imperative

    c = (a / b).asnumpy()

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\ndarray\ndarray.py", line 287, in __truediv__

    return divide(self, other)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\ndarray\ndarray.py", line 2952, in divide

    _internal._rdiv_scalar)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\ndarray\ndarray.py", line 2708, in _ufunc_helper

    return fn_array(lhs, rhs)

  File "<string>", line 50, in broadcast_div

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\_ctypes\ndarray.py", line 92, in _imperative_invoke

    ctypes.byref(out_stypes)))

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\base.py", line 253, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [06:19:49] c:\jenkins_slave\workspace\build-gpu\src\operator\tensor\./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1: operands could not be broadcast 

together with shapes [3] [0]
06:19:51] C:\jenkins_slave\workspace\build-gpu\src\base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7402, which is older than the oldest version tested by CI 

(7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.



Process SpawnProcess-8:

Traceback (most recent call last):

  File "C:\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap

    self.run()

  File "C:\Python37\lib\multiprocessing\process.py", line 99, in run

    self._target(*self._args, **self._kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\test_operator_gpu.py", line 2041, in kernel_error_check_symbolic

    'b':mx.nd.array([],ctx=mx.gpu(0))})

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\symbol\symbol.py", line 1804, in bind

    ctypes.byref(handle)))

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\../../../python\mxnet\base.py", line 253, in check_call

    raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: Error in operator _div0: [06:19:52] c:\jenkins_slave\workspace\build-gpu\src\operator\tensor\../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): 

Incompatible attr in node _div0 at 1-th input: expected [3], got [0]

@samskalicky
Copy link
Contributor Author

Also getting some random failures in the CI:

Ran 834 tests in 10136.178s



OK (SKIP=39)

+ nosetests-2.7 --with-coverage --cover-inclusive --cover-xml --cover-branches --cover-package=mxnet --with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error --with-xunit --xunit-file nosetests_train.xml --verbose tests/python/train

[07:32:28] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)

[07:32:28] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)

Sending interrupt signal to process

build.py: 2019-07-19 07:42:31,626Z WARNING Signal 15 received, cleaning up...

build.py: 2019-07-19 07:42:31,626Z WARNING Cleaning up containers

build.py: 2019-07-19 07:42:35,518Z INFO ☠: stopped container cfeaa1cecc83

build.py: 2019-07-19 07:42:35,575Z INFO 🚽: removed container cfeaa1cecc83

build.py: 2019-07-19 07:42:35,575Z INFO Cleaning up containers finished.

build.py: 2019-07-19 07:42:35,575Z WARNING done. Exiting with error.

script returned exit code 1

@perdasilva @lebeg

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :-)

@junrushao
Copy link
Member

Probably the CI flakiness can be reported in separate issue?

@samskalicky samskalicky requested a review from szha as a code owner July 19, 2019 16:52
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like group norm PR was mixed in this PR.

@karan6181
Copy link
Contributor

@mxnet-label-bot add [Build, pr-awaiting-response]

@samskalicky
Copy link
Contributor Author

@szha removed the incorrect merge

@junrushao
Copy link
Member

Should be good to merge @szha

@samskalicky
Copy link
Contributor Author

@mxnet-label-bot add [pr-awaiting-merge]
@mxnet-label-bot remove [pr-awaiting-response]

@marcoabreu marcoabreu added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Jul 20, 2019
@samskalicky
Copy link
Contributor Author

@mxnet-label-bot remove [pr-awaiting-response]

@marcoabreu marcoabreu removed the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Jul 20, 2019
@szha szha merged commit 5086ff0 into apache:master Jul 20, 2019
TaoLv pushed a commit to TaoLv/incubator-mxnet that referenced this pull request Aug 9, 2019
TaoLv added a commit that referenced this pull request Aug 11, 2019
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Build pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WarpCTC build failure in 1.5.0
5 participants