Prediction with MXNET Cuda version not working #331

InzamamAnwar · 2019-09-20T14:16:35Z

Description

DeepAR model trained with mxnet-cu 1.5.0 is giving error "vector::_M_range_insert"

To Reproduce

Data cannot be shared. Code is given below

trainer_config = Trainer(epochs= 400, batch_size=32, num_batches_per_epoch=75, 
                         clip_gradient=20.0, weight_decay=1e-6, init = "xavier"
                        )
estimator = DeepAREstimator(freq="W", prediction_length=52, trainer=trainer_config, context_length=52, num_layers=3, num_cells=52, cell_type="lstm", dropout_rate=0.15, use_feat_dynamic_real=True, use_feat_static_cat=True, cardinality=[5, 22, 90, 1343, 45, 3], embedding_dimension=100, distr_output=GaussianOutput(), scaling=False)

predictor = estimator.train(training_data=train_data)

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry).plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
    break
plt.grid(which='both')
plt.legend()

Error Message

---------------------------------------------------------------------------
MXNetError                                Traceback (most recent call last)
<ipython-input-13-41efef67ca9e> in <module>
----> 1 for test_entry, forecast in zip(test_ds, predictor.predict(test_ds)):
      2     to_pandas(test_entry).plot(linewidth=2)
      3     forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
      4     break
      5 plt.grid(which='both')

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/model/predictor.py in predict(self, dataset, num_eval_samples)
    303         for batch in inference_data_loader:
    304             inputs = [batch[k] for k in self.input_names]
--> 305             outputs = self.prediction_net(*inputs).asnumpy()
    306             if self.output_transform is not None:
    307                 outputs = self.output_transform(batch, outputs)

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    546             hook(self, args)
    547 
--> 548         out = self.forward(*args)
    549 
    550         for hook in self._forward_hooks.values():

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
    923                     params = {i: j.data(ctx) for i, j in self._reg_params.items()}
    924 
--> 925                 return self.hybrid_forward(ndarray, x, *args, **params)
    926 
    927         assert isinstance(x, Symbol), \

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/model/deepar/_network.py in hybrid_forward(self, F, feat_static_cat, past_time_feat, past_target, past_observed_values, future_time_feat)
    583             static_feat=static_feat,
    584             scale=scale,
--> 585             begin_states=state,
    586         )

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/model/deepar/_network.py in sampling_decoder(self, F, static_feat, past_target, time_feat, scale, begin_states)
    520 
    521             # (batch_size * num_samples, 1, *target_shape)
--> 522             new_samples = distr.sample()
    523 
    524             # (batch_size * num_samples, seq_len, *target_shape)

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/distribution/transformed_distribution.py in sample(self, num_samples)
     80     def sample(self, num_samples: Optional[int] = None) -> Tensor:
     81         with autograd.pause():
---> 82             s = self.base_distribution.sample(num_samples=num_samples)
     83             for t in self.transforms:
     84                 s = t.f(s)

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/distribution/gaussian.py in sample(self, num_samples)
     87             mu=self.mu,
     88             sigma=self.sigma,
---> 89             num_samples=num_samples,
     90         )
     91 

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/gluonts/distribution/distribution.py in _sample_multiple(sample_func, num_samples, *args, **kwargs)
    245         k: _expand_param(v, num_samples) for k, v in kwargs.items()
    246     }
--> 247     samples = sample_func(*args_expanded, **kwargs_expanded)
    248     return samples

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/mxnet/ndarray/register.py in sample_normal(mu, sigma, shape, dtype, out, name, **kwargs)

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
     90         c_str_array(keys),
     91         c_str_array([str(s) for s in vals]),
---> 92         ctypes.byref(out_stypes)))
     93 
     94     if original_output is not None:

~/anaconda3/envs/gluonts/lib/python3.7/site-packages/mxnet/base.py in check_call(ret)
    251     """
    252     if ret != 0:
--> 253         raise MXNetError(py_str(_LIB.MXGetLastError()))
    254 
    255 

MXNetError: vector::_M_range_insert

Environment

Operating system: Ubuntu 16.04
Python version: 3.7.4
GluonTS version: 0.3.3
MXNET: mxnet-cu100 1.5.0

Hack
Save trained model and load it again in an environment having mxnet without cuda (1.4.0)

The text was updated successfully, but these errors were encountered:

vafl · 2019-09-20T14:38:12Z

GluonTS currently only supports MxNet 1.4.1.

There is a bug in mxnet 1.5 (see this PR and the linked issue #245), which prevents us from upgrading. I think your issue is related to this. Could you try with mxnet 1.4.1? Thanks

InzamamAnwar · 2019-09-23T13:00:09Z

@vafl MxNet 1.4.1 with CUDA 10.0 is working. Thanks!

InzamamAnwar added the bug Something isn't working label Sep 20, 2019

InzamamAnwar closed this as completed Sep 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction with MXNET Cuda version not working #331

Prediction with MXNET Cuda version not working #331

InzamamAnwar commented Sep 20, 2019

vafl commented Sep 20, 2019

InzamamAnwar commented Sep 23, 2019

Prediction with MXNET Cuda version not working #331

Prediction with MXNET Cuda version not working #331

Comments

InzamamAnwar commented Sep 20, 2019

Description

To Reproduce

Error Message

Environment

vafl commented Sep 20, 2019

InzamamAnwar commented Sep 23, 2019