float64 data backward error using gluon #9156

zhaoningning · 2017-12-20T10:25:56Z

I write a custom loss in gluon,and when using float32 data type, everything is ok, but whien I changed to use float64 data type,there is a error says:
“include/mxnet/././tensor_blob.h:217: Check failed: mshadow::DataType<DType>::kFlag == type_flag_ TBlob.get_with_shape: data type do not match specified type.Expected: 0 v.s. given 1”，
this happend after loss is calculated ,when loss.backward () is executed.
mxnet version is 1.0.0, ubuntu 14.04,python2.7

The text was updated successfully, but these errors were encountered:

sxjscience · 2017-12-20T18:17:24Z

@zhaoningning You need to cast the type to float32 explicitly. Use arr.astype(np.float32) to cast the data type.

zhaoningning · 2017-12-21T02:12:14Z

@sxjscience But I use float64 data and float64 parameters, still need to cast the loss to float32 ? I have to use float64 data type because it may generate very small values in the forward process.

sxjscience · 2017-12-21T19:21:33Z

@zhaoningning You can try to explicitly set the dtype of all the ndarray weights/biases to float64. Also, would float64 be a must? Most deep learning algorithms can run in float32 types.

zhaoningning · 2017-12-22T05:31:02Z

@sxjscience I have already cast all data to float64,so forward is OK ,but backward give error.....
I have to use float64 because it may produce very small values during calculate loss (e.g, 1e-60...)

Soonhwan-Kwon · 2018-01-29T08:01:09Z

Same error occurs when I use float16 and I'm not using gluon.
"mxnet.base.MXNetError: [05:42:23] include/mxnet/././tensor_blob.h:217: Check failed: mshadow::DataType<DType>::kFlag == type_flag_ TBlob.get_with_shape: data type do not match specified type.Expected: 0 v.s. given 2"
And also when it's on backward and ok with forward.

Soonhwan-Kwon · 2018-01-29T08:02:42Z

...
  File "/data/ecg_2018/train.py", line 93, in do_training
    module.forward_backward(data_batch)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/base_module.py", line 192, in forward_backward
    self.backward()
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/bucketing_module.py", line 444, in backward
    self._curr_module.backward(out_grads=out_grads)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/module.py", line 627, in backward
    self._exec_group.backward(out_grads=out_grads)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/module/executor_group.py", line 580, in backward
    exec_.backward(out_grads=out_grads_slice)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/executor.py", line 234, in backward
    ctypes.c_int(is_train)))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [05:42:23] include/mxnet/././tensor_blob.h:217: Check failed: mshadow::DataType<DType>::kFlag == type_flag_ TBlob.get_with_shape: data type do not match specified type.Expected: 0 v.s. given 2

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f03ecd9bcda]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f03ecd9c878]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mshadow::half::half_t* mxnet::TBlob::dptr<mshadow::half::half_t>() const+0xd7) [0x7f03ecdb74a7]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mshadow::Tensor<mshadow::gpu, 3, mshadow::half::half_t> mxnet::TBlob::get_with_shape<mshadow::gpu, 3, mshadow::half::half_t>(mshadow::Shape<3> const&, mshadow::Stream<mshadow::gpu>*) const+0x56c) [0x7f03ef94f84c]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::op::SliceChannelOp<mshadow::gpu, mshadow::half::half_t>::Backward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x9a7) [0x7f03f0ab9be7]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::op::OperatorState::Backward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x767) [0x7f03ef2f18a7]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::exec::StatefulComputeExecutor::Run(mxnet::RunContext, bool)+0x69) [0x7f03ef876429]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(+0x339a050) [0x7f03ef849050]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::engine::NaiveEngine::Push(mxnet::engine::Opr*, mxnet::Context, int, bool)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x61) [0x7f03ef79ef61]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-1.0.1-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)+0x4da) [0x7f03ef7aac8a]

indhub · 2018-04-10T18:26:38Z

@Soonhwan-Kwon Could you please add a small example that reproduces the problem?

sandeep-krishnamurthy · 2018-08-31T16:24:20Z

@Soonhwan-Kwon / @zhaoningning - Can you please provide a small code sample for reproducing the issue.

zhaoningning · 2018-09-04T09:48:01Z

@sandeep-krishnamurthy sorry ,I have moved to other solutions for float64 training, and also can not reproduce this issue because the code is missing after such a long time....

sandeep-krishnamurthy · 2018-09-04T15:42:45Z

This PR - #12412 should fix using other than FP32 params in Gluon. Resolving. Please reopen if closed in error.

indhub added Gluon FP16 labels Apr 10, 2018

sandeep-krishnamurthy added the Pending Requester Info label Aug 31, 2018

sandeep-krishnamurthy closed this as completed Sep 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float64 data backward error using gluon #9156

float64 data backward error using gluon #9156

zhaoningning commented Dec 20, 2017 •

edited by indhub

Loading

sxjscience commented Dec 20, 2017

zhaoningning commented Dec 21, 2017

sxjscience commented Dec 21, 2017

zhaoningning commented Dec 22, 2017

Soonhwan-Kwon commented Jan 29, 2018 •

edited by indhub

Loading

Soonhwan-Kwon commented Jan 29, 2018 •

edited by indhub

Loading

indhub commented Apr 10, 2018

sandeep-krishnamurthy commented Aug 31, 2018

zhaoningning commented Sep 4, 2018

sandeep-krishnamurthy commented Sep 4, 2018

float64 data backward error using gluon #9156

float64 data backward error using gluon #9156

Comments

zhaoningning commented Dec 20, 2017 • edited by indhub Loading

sxjscience commented Dec 20, 2017

zhaoningning commented Dec 21, 2017

sxjscience commented Dec 21, 2017

zhaoningning commented Dec 22, 2017

Soonhwan-Kwon commented Jan 29, 2018 • edited by indhub Loading

Soonhwan-Kwon commented Jan 29, 2018 • edited by indhub Loading

indhub commented Apr 10, 2018

sandeep-krishnamurthy commented Aug 31, 2018

zhaoningning commented Sep 4, 2018

sandeep-krishnamurthy commented Sep 4, 2018

zhaoningning commented Dec 20, 2017 •

edited by indhub

Loading

Soonhwan-Kwon commented Jan 29, 2018 •

edited by indhub

Loading

Soonhwan-Kwon commented Jan 29, 2018 •

edited by indhub

Loading