Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Nightly test] v1.3.x failing with missing cmake #13800

Closed
jlcontreras opened this issue Jan 8, 2019 · 15 comments
Closed

[Nightly test] v1.3.x failing with missing cmake #13800

jlcontreras opened this issue Jan 8, 2019 · 15 comments
Labels

Comments

@jlcontreras
Copy link
Contributor

The step InstallationGuide:GPU is failing in the nightly tests due to missing cmake. The problems started on Jan 3rd.

Example case:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTests/detail/v1.3.x/146/pipeline

make[1]: Leaving directory '/work/mxnet/incubator-mxnet/3rdparty/tvm/nnvm'
Downloaded and unpacked Intel(R) MKL small libraries to /work/mxnet/incubator-mxnet/3rdparty/mkldnn/external
cmake /work/mxnet/incubator-mxnet/3rdparty/mkldnn -DCMAKE_INSTALL_PREFIX=/work/mxnet/incubator-mxnet/3rdparty/mkldnn/build/install -B/work/mxnet/incubator-mxnet/3rdparty/mkldnn/build -DARCH_OPT_FLAGS="-mtune=generic" -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF
/bin/sh: 1: cmake: not found
mkldnn.mk:38: recipe for target '/work/mxnet/incubator-mxnet/3rdparty/mkldnn/build/install/lib/libmkldnn.so.0' failed
make: *** [/work/mxnet/incubator-mxnet/3rdparty/mkldnn/build/install/lib/libmkldnn.so.0] Error 127
make: *** Waiting for unfinished jobs....
ar cr libdmlc.a line_split.o indexed_recordio_split.o recordio_split.o input_split_base.o io.o filesys.o local_filesys.o data.o recordio.o config.o
make[1]: Leaving directory '/work/mxnet/incubator-mxnet/3rdparty/dmlc-core'
build.py: 2019-01-06 19:51:59,687 Running of command in container failed (2):
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Test

@jlcontreras
Copy link
Contributor Author

@TaoLv Do you think this could be related to the latest MKL updates?

@TaoLv
Copy link
Member

TaoLv commented Jan 8, 2019

Do you mean the v1.3.x branch? I notice the last commit on it is more than 1 month ago.

@TaoLv
Copy link
Member

TaoLv commented Jan 8, 2019

With v1.3.x branch, I don't think command line like make -j 32 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 will trigger MKL-DNN build.

@jlcontreras
Copy link
Contributor Author

Thanks for the reply :)
I did mean the v1.3.x branch, editing the title. Maybe #13681 somehow had something to do? Dates coincide, but that PR shouldn't impact v1.3.x

@jlcontreras jlcontreras changed the title [Nightly test] 1.3.x failing with missing cmake [Nightly test] v1.3.x failing with missing cmake Jan 8, 2019
@TaoLv
Copy link
Member

TaoLv commented Jan 8, 2019

Yes, from that PR, we have MKL-DNN default on master branch. It definitely should not impact branch v1.3.x. So I guess the source code you built was not that one you checked out from v1.3.x.

@zachgk
Copy link
Contributor

zachgk commented Jan 8, 2019

Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it.

@mxnet-label-bot add [CI]

@marcoabreu marcoabreu added the CI label Jan 8, 2019
@TaoLv
Copy link
Member

TaoLv commented Jan 10, 2019

@jlcontreras Is the problem still there? Could you double check the code base used for InstallationGuide: GPU. I notice the submodule update message in the log:

Submodule path '3rdparty/tvm': checked out '0f053c82a747b4dcdf49570ec87c17e0067b7439'

apache/tvm@0f053c8 is used on the master branch while on v1.3.x it's apache/tvm@90db723 .

@jlcontreras
Copy link
Contributor Author

Yes, the problem still persists. I'll update this issue with any progress made. Thanks for the info about the submodule update!

@jlcontreras
Copy link
Contributor Author

Found why we are checking out the wrong version of tvm. The InstallationGuide step in CI executes the commands from docs/install/index.md . One of the steps there is the following:

$ git clone --recursive https://github.com/apache/incubator-mxnet
$ cd incubator-mxnet
$ make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas

So we are actually testing master, even from this v1.3.x job.

@TaoLv
Copy link
Member

TaoLv commented Feb 12, 2019

@jlcontreras issue resolved?

@perdasilva
Copy link
Contributor

perdasilva commented Feb 13, 2019

@TaoLv not yet. The issue seems to come from the following command:

eval sudo apt-get 'update;' sudo apt-get install -y build-essential 'git;' sudo apt-get install -y libopenblas-dev 'liblapack-dev;' sudo apt-get install -y 'libopencv-dev;' git clone --recursive 'https://github.com/apache/incubator-mxnet;' cd 'incubator-mxnet;' make -j '$(nproc)' USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda 'USE_CUDNN=1;' sudo apt-get install -y python-dev python-setuptools python-pip 'libgfortran3;' cd 'python;' pip install -e '.;'

It seems we are testing whether the v1.3.x documentation has the instructions to compile the master branch. I guess there must have been some changes in the dependencies on master, which have updated the documentation (therefore the master build passes). We need to figure out a way to test the documentation instructions against the code in branch in which those instructions reside.

@piyushghai
Copy link
Contributor

@jlcontreras I see a PR merged that seems to have fixed this issue.
Do you think this issue is good to close now ?

@perdasilva
Copy link
Contributor

@piyushghai should be fine to close ^^

@lanking520
Copy link
Member

Close this issue for now. Please feel free to reopen it if you are facing more problems with it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants