Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Can't pickle MXNet Modules #8955

Open
fedorzh opened this issue Dec 5, 2017 · 9 comments
Open

Can't pickle MXNet Modules #8955

fedorzh opened this issue Dec 5, 2017 · 9 comments

Comments

@fedorzh
Copy link

fedorzh commented Dec 5, 2017

Description

Can't pickle mxnet Modules

Environment info (Required)

print pickle.__version__
print mx.__version__
$Revision: 72223 $
0.12.0

Error Message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-024323561f98> in <module>()
      1 import pickle
----> 2 pickle.dumps(mlp_model)

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in dumps(obj, protocol)
   1378 def dumps(obj, protocol=None):
   1379     file = StringIO()
-> 1380     Pickler(file, protocol).dump(obj)
   1381     return file.getvalue()
   1382 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    423 
    424         if state is not None:
--> 425             save(state)
    426             write(BUILD)
    427 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/home/ubuntu/anaconda2/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

/home/ubuntu/anaconda2/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle module objects

Minimum reproducible example

net = mx.sym.Variable('data')
net = mx.sym.flatten(net)
net  = mx.sym.FullyConnected(net, num_hidden=128)
net = mx.sym.Activation(net, act_type="relu")
net = mx.sym.FullyConnected(net, num_hidden = 64)
net = mx.sym.Activation(net, act_type="relu")
net = mx.sym.FullyConnected(net, num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mlp_model = mx.mod.Module(symbol=net, context=mx.gpu())
import pickle
pickle.dumps(mlp_model)
@edmBernard
Copy link

why did you want to pickle the whole module class ?
you can saved your symbol with: https://mxnet.incubator.apache.org/api/python/symbol.html#mxnet.symbol.Symbol.save

@fedorzh
Copy link
Author

fedorzh commented Dec 5, 2017

For example, I use parallel processing to distribute my training jobs. joblib uses pickle in multiprocessing

@edmBernard
Copy link

Honestly I don't think module can be pickle. Mxnet have lot's of C++ inside.
For multi cpu core processing (if you don't use GPU) Mxnet support configuration with environnement variable: https://mxnet.incubator.apache.org/how_to/env_var.html
More you can also use NNPACK to parallelize training operation over cpu.

@fedorzh
Copy link
Author

fedorzh commented Dec 6, 2017

If you take the old interface with mxnet.model, it can be pickled.
Training of the model is actually not the longest part of my pipeline (sometimes, adds an insignificant part), a lot of other numpy-based machinery is happening outside of it, and parallelization helps immensely with that - I have to run multiple processes with different seeds

@leleamol
Copy link
Contributor

Proposed Labels:"Feature Request", "Module","Python"

@sliawatimena
Copy link

sliawatimena commented Oct 10, 2018

I am using windows 7, anaconda navigator 1.9.2, Python 3.6.6, Jupyter notebook 5.7.0, try to learn code from Gluon crash course chapter 5.

I already add:
import pickle

I got stuck at

for data, label in train_data:
    print(data.shape, label.shape)
    break
---
AttributeError                            Traceback (most recent call last)
<ipython-input-8-91a66f98d1d2> in <module>()
----> 1 for data, label in train_data:
      2     print(data.shape, label.shape)
      3     break

E:\Anaconda\envs\mxnet\lib\site-packages\mxnet\gluon\data\dataloader.py in __iter__(self)
    282         # multi-worker
    283         return _MultiWorkerIter(self._num_workers, self._dataset,
--> 284                                 self._batchify_fn, self._batch_sampler)
    285 
    286     def __len__(self):

E:\Anaconda\envs\mxnet\lib\site-packages\mxnet\gluon\data\dataloader.py in __init__(self, num_workers, dataset, batchify_fn, batch_sampler)
    142                 args=(self._dataset, self._key_queue, self._data_queue, self._batchify_fn))
    143             worker.daemon = True
--> 144             worker.start()
    145             workers.append(worker)
    146 

E:\Anaconda\envs\mxnet\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         # Avoid a refcycle if the target function holds an indirect

E:\Anaconda\envs\mxnet\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

E:\Anaconda\envs\mxnet\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

E:\Anaconda\envs\mxnet\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

E:\Anaconda\envs\mxnet\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object 'Dataset.transform_first.<locals>.base_fn'

please help. thanks.

@piyushghai
Copy link
Contributor

@sliawatimena Can you file a separate issue on this repository and also can you provide a minimum reproducible example to help debug this issue ?

From the stacktrace that you've posted, it seems unclear as to where you are using pickle ?
Also, are you using pickle.dump or pickle.load ?

@sliawatimena
Copy link

sliawatimena commented Oct 11, 2018

Dear @piyushghai,

I just copy from 5. Train the neural network, from step 1 - 5 are okay. In step 6, the error message are as previous post.

From googling results: this looks like a Windows-specific problem with Python multiprocessing and Jupyter Notebook. Please help.

Thanks.

Suryadi

@egeaydin
Copy link

egeaydin commented Oct 13, 2018

This #10562 issue helped me.

train_data = gluon.data.DataLoader(
    mnist_train, batch_size=batch_size, shuffle=True, num_workers=0)

Change num_works=4 to num_works=0.

Also, do the same for validation data.

Hope this helps.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants