Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

backport fixes in master branch #19356

Merged
merged 1 commit into from
Oct 21, 2020
Merged

backport fixes in master branch #19356

merged 1 commit into from
Oct 21, 2020

Conversation

Neutron3529
Copy link
Contributor

Description

In the current version of KLDivLoss, the return value is not the same value calculated by SoftmaxCrossEntropyLoss, which is not documented. It may due to the incorrect settings which using mean rather than sum dealing with the return value.
I provide a fix of this setting, which will keep the return value of KLDivLoss and SoftmaxCrossEntropyLoss almost the same when from_logits=False and sparse_label=False are set to these functions seperately.
Now, the behave of KLDivLoss is exactly the same to what the document say.

to reproduce the misbehave in the current master branch:

import mxnet as mx
a=mx.nd.array([[-1,1],[1,-1]])
b=mx.nd.array([1,0]).one_hot(2)
TrueLoss=mx.gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)
FalseLoss=mx.gluon.loss.KLDivLoss(from_logits=False)
c=TrueLoss(a,b)
d=FalseLoss(a,b)*a.shape[-1]
assert((c-d).abs().sum()==0 and a.shape[-1]==2)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Now, the behave of KLDivLoss is exactly the same to what the document say.

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @Neutron3529 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, unix-cpu, website, unix-gpu, miscellaneous, windows-gpu, sanity, centos-cpu, edge, clang, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Oct 15, 2020
@lanking520 lanking520 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 16, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu, unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-gpu, unix-gpu]

@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 16, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu,unix-cpu]

seems something wrong with the network connection.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu, unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 16, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu,unix-cpu,unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-cpu, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 16, 2020
@Neutron3529
Copy link
Contributor Author

errors of unix-gpu:

[2020-10-16T12:31:48.968Z] Unpacking libobjc4:amd64 (5.4.0-6ubuntu1~16.04.12) ...

[2020-10-16T12:31:49.892Z] The command '/bin/sh -c /work/ubuntu_clang.sh' returned a non-zero code: 137

[2020-10-16T12:31:49.892Z] 2020-10-16 12:31:48,462 - root - WARNING - Exception: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_gpu_cu101', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_gpu_cu101', '-t', 'mxnetci/build.ubuntu_gpu_cu101', 'docker']' returned non-zero exit status 137., Retrying in 1 seconds...

[2020-10-16T12:31:50.921Z] 2020-10-16 12:31:49,464 - root - INFO - Running command: 'docker build -f docker/Dockerfile.build.ubuntu_gpu_cu101 --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 --cache-from mxnetci/build.ubuntu_gpu_cu101 -t mxnetci/build.ubuntu_gpu_cu101 docker'

[2020-10-16T12:31:50.921Z] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

[2020-10-16T12:31:50.921Z] 2020-10-16 12:31:49,506 - root - WARNING - Exception: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_gpu_cu101', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_gpu_cu101', '-t', 'mxnetci/build.ubuntu_gpu_cu101', 'docker']' returned non-zero exit status 1., Retrying in 2 seconds...

[2020-10-16T12:31:52.802Z] 2020-10-16 12:31:51,508 - root - INFO - Running command: 'docker build -f docker/Dockerfile.build.ubuntu_gpu_cu101 --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 --cache-from mxnetci/build.ubuntu_gpu_cu101 -t mxnetci/build.ubuntu_gpu_cu101 docker'

[2020-10-16T12:31:52.802Z] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

[2020-10-16T12:31:53.056Z] Traceback (most recent call last):

[2020-10-16T12:31:53.056Z]   File "ci/build.py", line 456, in <module>

[2020-10-16T12:31:53.056Z]     sys.exit(main())

[2020-10-16T12:31:53.056Z]   File "ci/build.py", line 366, in main

[2020-10-16T12:31:53.056Z]     cache_intermediate=args.cache_intermediate)

[2020-10-16T12:31:53.056Z]   File "ci/build.py", line 114, in build_docker

[2020-10-16T12:31:53.056Z]     run_cmd()

[2020-10-16T12:31:53.056Z]   File "/home/jenkins_slave/workspace/build-cmake-gpu/ci/util.py", line 84, in f_retry

[2020-10-16T12:31:53.056Z]     return f(*args, **kwargs)

[2020-10-16T12:31:53.056Z]   File "ci/build.py", line 112, in run_cmd

[2020-10-16T12:31:53.056Z]     check_call(cmd)

[2020-10-16T12:31:53.056Z]   File "/usr/lib/python3.6/subprocess.py", line 311, in check_call

[2020-10-16T12:31:53.056Z]     raise CalledProcessError(retcode, cmd)

[2020-10-16T12:31:53.056Z] subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_gpu_cu101', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_gpu_cu101', '-t', 'mxnetci/build.ubuntu_gpu_cu101', 'docker']' returned non-zero exit status 1.

script returned exit code 1

errors in unix-cpu

[2020-10-16T12:29:07.672Z] Platform Info:

[2020-10-16T12:29:07.672Z]   OS: Linux (x86_64-pc-linux-gnu)

[2020-10-16T12:29:07.672Z]   CPU: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz

[2020-10-16T12:29:07.672Z]   WORD_SIZE: 64

[2020-10-16T12:29:07.672Z]   LIBM: libopenlibm

[2020-10-16T12:29:07.672Z]   LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

[2020-10-16T12:29:07.672Z] + install_julia 1.0 1.0.4

[2020-10-16T12:29:07.672Z] ++ echo 1.0

[2020-10-16T12:29:07.672Z] ++ sed 's/\.//'

[2020-10-16T12:29:07.672Z] + local suffix=10

[2020-10-16T12:29:07.672Z] + local JLBINARY=julia-1.0.tar.gz

[2020-10-16T12:29:07.672Z] + local JULIADIR=/work/julia10

[2020-10-16T12:29:07.672Z] + local JULIA=/work/julia10/bin/julia

[2020-10-16T12:29:07.672Z] + mkdir -p /work/julia10

[2020-10-16T12:29:07.672Z] + wget -qO julia-1.0.tar.gz https://julialang-s3.julialang.org/bin/linux/x64/1.0/julia-1.0.4-linux-x86_64.tar.gz

[2020-10-16T12:29:08.230Z] + tar xzf julia-1.0.tar.gz -C /work/julia10 --strip 1

[2020-10-16T12:29:10.736Z] + rm julia-1.0.tar.gz

[2020-10-16T12:29:10.736Z] + /work/julia10/bin/julia -e 'using InteractiveUtils; versioninfo()'

[2020-10-16T12:29:11.659Z] Julia Version 1.0.4

[2020-10-16T12:29:11.659Z] Commit 38e9fb7f80 (2019-05-16 03:38 UTC)

[2020-10-16T12:29:11.659Z] Platform Info:

[2020-10-16T12:29:11.659Z]   OS: Linux (x86_64-pc-linux-gnu)

[2020-10-16T12:29:11.659Z]   CPU: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz

[2020-10-16T12:29:11.659Z]   WORD_SIZE: 64

[2020-10-16T12:29:11.659Z]   LIBM: libopenlibm

[2020-10-16T12:29:11.659Z]   LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

[2020-10-16T12:29:15.821Z] unexpected EOF

[2020-10-16T12:29:15.821Z] 2020-10-16 12:29:14,098 - root - WARNING - Exception: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_cpu', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_cpu', '-t', 'mxnetci/build.ubuntu_cpu', 'docker']' returned non-zero exit status 1., Retrying in 1 seconds...

[2020-10-16T12:29:16.382Z] 2020-10-16 12:29:15,099 - root - INFO - Running command: 'docker build -f docker/Dockerfile.build.ubuntu_cpu --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 --cache-from mxnetci/build.ubuntu_cpu -t mxnetci/build.ubuntu_cpu docker'

[2020-10-16T12:29:16.382Z] time="2020-10-16T12:29:15Z" level=error msg="failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: connection refused"

[2020-10-16T12:29:16.382Z] error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/build?buildargs=%7B%22GROUP_ID%22%3A%221001%22%2C%22USER_ID%22%3A%221001%22%7D&cachefrom=%5B%22mxnetci%2Fbuild.ubuntu_cpu%22%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile.build.ubuntu_cpu&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&session=qkdrmkwrs29ewyxl09o13yyx5&shmsize=0&t=mxnetci%2Fbuild.ubuntu_cpu&target=&ulimits=null&version=1: context canceled

[2020-10-16T12:29:16.382Z] 2020-10-16 12:29:15,136 - root - WARNING - Exception: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_cpu', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_cpu', '-t', 'mxnetci/build.ubuntu_cpu', 'docker']' returned non-zero exit status 1., Retrying in 2 seconds...

[2020-10-16T12:29:18.269Z] 2020-10-16 12:29:17,139 - root - INFO - Running command: 'docker build -f docker/Dockerfile.build.ubuntu_cpu --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 --cache-from mxnetci/build.ubuntu_cpu -t mxnetci/build.ubuntu_cpu docker'

[2020-10-16T12:29:18.269Z] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

[2020-10-16T12:29:18.269Z] Traceback (most recent call last):

[2020-10-16T12:29:18.269Z]   File "ci/build.py", line 456, in <module>

[2020-10-16T12:29:18.269Z]     sys.exit(main())

[2020-10-16T12:29:18.269Z]   File "ci/build.py", line 366, in main

[2020-10-16T12:29:18.269Z]     cache_intermediate=args.cache_intermediate)

[2020-10-16T12:29:18.269Z]   File "ci/build.py", line 114, in build_docker

[2020-10-16T12:29:18.269Z]     run_cmd()

[2020-10-16T12:29:18.269Z]   File "/home/jenkins_slave/workspace/build-cpu-mkl/ci/util.py", line 84, in f_retry

[2020-10-16T12:29:18.269Z]     return f(*args, **kwargs)

[2020-10-16T12:29:18.269Z]   File "ci/build.py", line 112, in run_cmd

[2020-10-16T12:29:18.269Z]     check_call(cmd)

[2020-10-16T12:29:18.269Z]   File "/usr/lib/python3.6/subprocess.py", line 311, in check_call

[2020-10-16T12:29:18.269Z]     raise CalledProcessError(retcode, cmd)

[2020-10-16T12:29:18.269Z] subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.ubuntu_cpu', '--build-arg', 'USER_ID=1001', '--build-arg', 'GROUP_ID=1001', '--cache-from', 'mxnetci/build.ubuntu_cpu', '-t', 'mxnetci/build.ubuntu_cpu', 'docker']' returned non-zero exit status 1.

script returned exit code 1

errors in centos-cpu

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,429 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.NDArray was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,430 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.Symbol was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,430 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.Executor was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,436 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.io.MXDataIter was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,437 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.FeedForward was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,437 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.SparseNDArray was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:17.737Z] 2020-10-16 12:44:16,442 [Finalizer] [org.apache.mxnet.WarnIfNotDisposed] [WARN] - LEAK: [one-time warning] An instance of org.apache.mxnet.KVStore was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing

[2020-10-16T12:44:19.095Z] *** RUN ABORTED ***

[2020-10-16T12:44:19.095Z]   java.lang.Exception: https://s3.amazonaws.com/model-server/inputs/Pug-Cookie.jpg Download failed!

[2020-10-16T12:44:19.095Z]   at org.apache.mxnet.ImageSuite.downloadUrl(ImageSuite.scala:48)

[2020-10-16T12:44:19.095Z]   at org.apache.mxnet.ImageSuite.beforeAll(ImageSuite.scala:54)

[2020-10-16T12:44:19.095Z]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)

[2020-10-16T12:44:19.095Z]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)

[2020-10-16T12:44:19.095Z]   at org.apache.mxnet.ImageSuite.run(ImageSuite.scala:28)

[2020-10-16T12:44:19.095Z]   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1210)

[2020-10-16T12:44:19.095Z]   at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1257)

[2020-10-16T12:44:19.095Z]   at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1255)

[2020-10-16T12:44:19.095Z]   at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

[2020-10-16T12:44:19.095Z]   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)

[2020-10-16T12:44:19.095Z]   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1255)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)

[2020-10-16T12:44:19.095Z]   at org.scalatest.Suite$class.run(Suite.scala:1144)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)

[2020-10-16T12:44:19.095Z]   at scala.collection.immutable.List.foreach(List.scala:381)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner$.main(Runner.scala:827)

[2020-10-16T12:44:19.095Z]   at org.scalatest.tools.Runner.main(Runner.scala)

[2020-10-16T12:44:19.095Z] [INFO] ------------------------------------------------------------------------

[2020-10-16T12:44:19.095Z] [INFO] Reactor Summary:

[2020-10-16T12:44:19.095Z] [INFO] 

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Parent ....................... SUCCESS [ 14.965 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Initializer .................. SUCCESS [  5.813 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Initializer Native ........... SUCCESS [  1.168 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Macros ....................... SUCCESS [  9.933 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Native ....................... SUCCESS [  3.369 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Core ......................... FAILURE [ 59.737 s]

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Inference .................... SKIPPED

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Examples ..................... SKIPPED

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Spark ML ..................... SKIPPED

[2020-10-16T12:44:19.095Z] [INFO] Assembly Scala Package ............................. SKIPPED

[2020-10-16T12:44:19.095Z] [INFO] MXNet Scala Package - Full linux-x86_64-only ....... SKIPPED

[2020-10-16T12:44:19.096Z] [INFO] MXNet Scala Package - Full linux-x86_64-only ....... SKIPPED

[2020-10-16T12:44:19.096Z] [INFO] ------------------------------------------------------------------------

[2020-10-16T12:44:19.096Z] [INFO] BUILD FAILURE

[2020-10-16T12:44:19.096Z] [INFO] ------------------------------------------------------------------------

[2020-10-16T12:44:19.096Z] [INFO] Total time: 01:36 min

[2020-10-16T12:44:19.096Z] [INFO] Finished at: 2020-10-16T12:44:17+00:00

[2020-10-16T12:44:19.096Z] [INFO] Final Memory: 42M/2642M

[2020-10-16T12:44:19.096Z] [INFO] ------------------------------------------------------------------------

[2020-10-16T12:44:19.096Z] [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project mxnet-core: There are test failures -> [Help 1]

[2020-10-16T12:44:19.096Z] [ERROR] 

[2020-10-16T12:44:19.096Z] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[2020-10-16T12:44:19.096Z] [ERROR] Re-run Maven using the -X switch to enable full debug logging.

[2020-10-16T12:44:19.096Z] [ERROR] 

[2020-10-16T12:44:19.096Z] [ERROR] For more information about the errors and possible solutions, please read the following articles:

[2020-10-16T12:44:19.096Z] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

[2020-10-16T12:44:19.096Z] [ERROR] 

[2020-10-16T12:44:19.096Z] [ERROR] After correcting the problems, you can resume the build with the command

[2020-10-16T12:44:19.096Z] [ERROR]   mvn <goals> -rf :mxnet-core

[2020-10-16T12:44:19.350Z] 2020-10-16 12:44:18,039 - root - INFO - Waiting for status of container c6fd018f1c30 for 600 s.

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,268 - root - INFO - Container exit status: {'Error': None, 'StatusCode': 1}

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,268 - root - ERROR - Container exited with an error 😞

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,268 - root - INFO - Executed command for reproduction:

[2020-10-16T12:44:19.604Z] 

[2020-10-16T12:44:19.604Z] ci/build.py --docker-registry mxnetci --platform centos7_cpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh unittest_centos7_cpu_scala

[2020-10-16T12:44:19.604Z] 

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,268 - root - INFO - Stopping container: c6fd018f1c30

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,270 - root - INFO - Removing container: c6fd018f1c30

[2020-10-16T12:44:19.604Z] 2020-10-16 12:44:18,369 - root - CRITICAL - Execution of ['/work/runtime_functions.sh', 'unittest_centos7_cpu_scala'] failed with status: 1

script returned exit code 1

what I could do?
@mxnet-bot run ci [unix-cpu, centos-cpu, unix-gpu]

@mxnet-bot
Copy link

Undefined action detected.
Permissible actions are : run ci [all], run ci [job1, job2]
Example : @mxnet-bot run ci [all]
Example : @mxnet-bot run ci [centos-cpu, clang]

@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, centos-cpu, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 18, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 18, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 18, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 19, 2020
@Neutron3529
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 21, 2020
@szha szha merged commit 3f64203 into apache:v1.x Oct 21, 2020
@szha
Copy link
Member

szha commented Oct 21, 2020

@Neutron3529 thanks for backporting and sorry that it took several tries to get it through the CI. Fortunately, thanks to @josephevans the CI is now stabilized.

@Neutron3529 Neutron3529 deleted the patch-2 branch October 24, 2020 05:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants