Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

cpp_package instructions need clarification and examples need repair #12401

Closed
aaronmarkham opened this issue Aug 29, 2018 · 7 comments
Closed

Comments

@aaronmarkham
Copy link
Contributor

I tried following the instructions for building c++ examples.

The resulting binaries don't work out of the box, and after troubleshooting some terminate abruptly, core dump or do nothing.

./alexnet: error while loading shared libraries: libmxnet.so: cannot open shared object file: No such file or directory

I worked on the example C++ for image prediction (#12397) that has its own Makefile that points to libmxnet.so specifically, and was able to get it to run without hunting for the mxnet library. The Makefile for the cpp-package/example files does not mention the library at all. Maybe that's a problem?

The steps in the README could use some clarification. Eventually after trying a bunch of things, I found a reference that led me to try this:

export LD_LIBRARY_PATH=~/incubator-mxnet/lib

And some of the examples started working! At least they're finding the library. I can add this line to the instructions if this is what we should recommend that people do after building from source with the cpp package flag turned on. Is it?

But, then there's also the fact that several the examples core dump:

./mlp_gpu
[21:13:44] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[21:13:44] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
terminate called after throwing an instance of 'dmlc::Error'
  what():  [21:13:44] ../include/mxnet-cpp/ndarray.hpp:54: Check failed: MXNDArrayCreate(shape.data(), shape.size(), context.GetDeviceType(), context.GetDeviceId(), delay_alloc, &handle) == 0 (-1 vs. 0) 
./inception_bn 
terminate called after throwing an instance of 'dmlc::Error'
  what():  [21:15:16] ../include/mxnet-cpp/symbol.hpp:219: Check failed: MXSymbolInferShape(GetHandle(), keys.size(), keys.data(), arg_ind_ptr.data(), arg_shape_data.data(), &in_shape_size, &in_shape_ndim, &in_shape_data, &out_shape_size, &out_shape_ndim, &out_shape_data, &aux_shape_size, &aux_shape_ndim, &aux_shape_data, &complete) == 0 (-1 vs. 0) 
./test_score 
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)

Or do nothing:

./test_optimizer 
@anirudhacharya
Copy link
Member

@mxnet-label-bot [C++, Build, Example, Breaking]

@kalyc
Copy link
Contributor

kalyc commented Nov 8, 2018

@aaronmarkham The instructions here appear to be updated - https://mxnet.incubator.apache.org/install/c_plus_plus.html

@aaronmarkham
Copy link
Contributor Author

@kalyc - Yes those instructions are updated, but the examples were failing.
@leleamol Should this be closed? I think you had a PR that fixed some or all of the issues with the examples.

@stu1130
Copy link
Contributor

stu1130 commented Nov 14, 2018

@aaronmarkham after I dived deeper into examples. I've captured the current status.

Model cpu gpu comment
alexnet.cpp ✔️ ✔️ acc not update
googlenet.cpp ✔️ ✔️ N/A
mlp.cpp ✔️ ✔️ acc not update on gpu
mlp_cpu.cpp ✔️ N/A N/A
mlp_gpu.cpp N/A ✔️ N/A
mlp_csv.cpp ✔️ ✔️ acc not update
resnet.cpp ? ✔️ N/A
lenet.cpp ✔️ ✔️ N/A
lenet_with_mxdataiter.cpp ✔️ ✔️ acc not update
inception_bn.cpp @roywei raise a PR fixing this

will keep updating the table as soon as I got the result (it take more than 20 mins for one epoch for those examples)
There is actually one more example charRNN but I need to figure the input corpus format for it.
@roywei is fixing acc not update problem!

  • test_optimizer return nothing(do nothing) if it passes the test (we might need to add some feedback text)
  • Regarding test_score you need to input the min_score for it build/test_score 10 for example.

@aaronmarkham
Copy link
Contributor Author

Is v like ✔️ ?
And x like ❌ ?
Is there an issue with "acc not update"? Does this mean it runs the example, but never reports on accuracy? Is that really a problem for an example?

@roywei
Copy link
Member

roywei commented Nov 14, 2018

@aaronmarkham Refer to #13243, some models are not trained at all in examples. Params are not updated after each epoch.
@stu1130 Thank for the detailed table, suggest creating a PR to remove unnecessary prints in alexnet. Also, update readme on the usage of LD_LIBRARY_PATH. I am also not sure whether copy the lib folder is necessary in example make file

@stu1130
Copy link
Contributor

stu1130 commented Dec 4, 2018

@sandeep-krishnamurthy could you close this issue? @roywei PR has been merged into master

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants