Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix bug in profiler tutorial when using cpu #13695

Merged
merged 1 commit into from
Jan 4, 2019
Merged

Conversation

Soonhwan-Kwon
Copy link
Contributor

@Soonhwan-Kwon Soonhwan-Kwon commented Dec 20, 2018

Description

When using cpu only, it produces error because test_utils.list_gpus() always returns at least empty array, and do not goes to the exception and context will be set as gpu. So I used if else statement to get context properly and it works fine.
For the reference below is the code of test_utils.py

python/mxnet/test_utils.py

def list_gpus():
    """Return a list of GPUs
    Returns
    -------
    list of int:
        If there are n GPUs, then return a list [0,1,...,n-1]. Otherwise returns
        [].
    """
    re = ''
    nvidia_smi = ['nvidia-smi', '/usr/bin/nvidia-smi', '/usr/local/nvidia/bin/nvidia-smi']
    for cmd in nvidia_smi:
        try:
            re = subprocess.check_output([cmd, "-L"], universal_newlines=True)
        except (subprocess.CalledProcessError, OSError):
            pass
    return range(len([i for i in re.split('\n') if 'GPU' in i]))

and Error message
run_training_iteration(*next(itr))
Traceback (most recent call last):
File "", line 1, in
File "", line 5, in run_training_iteration
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 2135, in as_in_context
return self.copyto(context)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 2084, in copyto
return _internal._copyto(self, out=hret)
File "", line 25, in _copyto
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [03:59:09] src/ndarray/ndarray.cc:1270: GPU is not enabled

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x21f5a4) [0x7fd2f7def5a4]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x21f981) [0x7fd2f7def981]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::CopyFromTo(mxnet::NDArray const&, mxnet::NDArray const&, int, bool)+0x723) [0x7fd2fa8bc323]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::Resource, std::allocatormxnet::Resource > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x110) [0x7fd2fa763ba0]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::Resource, std::allocatormxnet::Resource > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&)+0x3ca) [0x7fd2fa76ee5a]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::Imperative::InvokeOp(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, mxnet::DispatchMode, mxnet::OpStatePtr)+0x839) [0x7fd2fa7748a9]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&)+0x38c) [0x7fd2fa77512c]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2ab24f9) [0x7fd2fa6824f9]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x6f) [0x7fd2fa682aef]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fd3097dcec0]

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)

try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error
Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marcoabreu
Copy link
Contributor

I think we are deprecating this method and instead use the GPU information from the context. I'm on my phone right now, so I can't give details, sorry

@Soonhwan-Kwon
Copy link
Contributor Author

Soonhwan-Kwon commented Dec 21, 2018

@marcoabreu Is there any reference(link or alternative method name) of your thought?

@Soonhwan-Kwon
Copy link
Contributor Author

Soonhwan-Kwon commented Dec 21, 2018

@marcoabreu I searched over list_gpus function and got latest one which was merged at Nov, 16.(#12918) and also using list_gpus util and they said fixed the util already. Is this fix related to your concern?

@Roshrini
Copy link
Member

I am not aware that we are deprecating this method otherwise LGTM. @marcoabreu can you provide details whenever you have time? Thanks!
@mxnet-label-bot Add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Dec 21, 2018
Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@marcoabreu - ping :-)

@Soonhwan-Kwon
Copy link
Contributor Author

@marcoabreu any update?

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu I think we can update the tutorial again when deprecation happens.

@szha szha merged commit 24068c2 into apache:master Jan 4, 2019
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jan 9, 2019
try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
try except approach only goes to ctx=mx.gpu() because test_utils.list_gpus() at least returns empty array and do not producing error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants