Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of unpickable object on multiprocessing (e.g. cloudpickle) #53

Open
wookayin opened this issue Jul 16, 2020 · 4 comments
Open
Labels
bug Something isn't working

Comments

@wookayin
Copy link

wookayin commented Jul 16, 2020

One inherent hassle of python multiprocessing is pickling. Currently, pypeln does not consider this case, hence an error:

In [2]: list(([lambda i: i] * 2) | pl.process.map(lambda x:x, workers=2))   
Traceback (most recent call last):
  File "..../lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "..../lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7f255a027f80>: attribute lookup <lambda> on __main__ failed
... (omitted) ...

Out [2]: []

(1) Why does it return a valid list? It should throw an Error rather than returning a "wrong" output.

(2) Can you add cloudpickle support to handle with non-pickleable objects?

For example, joblib supports cloudpickle: https://joblib.readthedocs.io/en/latest/auto_examples/serialization_and_wrappers.html

@wookayin
Copy link
Author

wookayin commented Jul 16, 2020

Also, pickling around builtin objects causes a strange error (when assert fails inside a subprocess):

Traceback (most recent call last):
  File ".../lib/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File ".../lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'AssertionError'>: it's not the same object as builtins.AssertionError

@cgarciae
Copy link
Owner

cgarciae commented Jul 16, 2020

Hey @wookayin,

For your example, for multiprocessing stuff just avoid using lambdas by defining proper python funtions in replacement.

  1. Edit: If its not crashing its a bug, I'll check this.
  2. I think pypeln might be able to fallback to cloudpickle if multiprocessing.Queue.put fails with one of these pickle errors on a given object. We would need to send a cloudpickled-vesion of the object and unpickle it on the other side. I don't know if you can make multiprocessing to use cloudpickle automatically.

What do you think?

@cgarciae cgarciae added the bug Something isn't working label Jul 16, 2020
@cgarciae
Copy link
Owner

cgarciae commented Jul 16, 2020

This issue will be hard to fix, I already have error checking code but multiprocessing.Queue.put is not yielding errors at the call site as seen in this example:

import time
from multiprocessing import Queue

data = [lambda i: i] * 10
queue = Queue()

for x in data:
    queue.put(x)
    time.sleep(0.1)

Therefore the code assumes that elements are effectively added when they are not and the execution ends normally with an empty list.

@lalo
Copy link
Contributor

lalo commented Jun 13, 2022

Hi, are the test out of date? These tests seem to use lambdas,

nums_pl2 = pl.process.map(lambda x: -x, nums_pl)

They fail locally, but I see them pass previously on the CI. Could it be a platform specific pickling issue? I might be missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants