diff --git a/docs/source/clientcode.rst b/docs/source/clientcode.rst index 0084f299..207e7745 100644 --- a/docs/source/clientcode.rst +++ b/docs/source/clientcode.rst @@ -114,6 +114,7 @@ For normal usage, there's not a single line of Pyro specific code once you have .. index:: single: object serialization double: serialization; pickle + double: serialization; cloudpickle double: serialization; dill double: serialization; serpent double: serialization; marshal @@ -167,6 +168,9 @@ on what objects you can use. but it's safe to add to the accepted serializers config item if you have it installed. * **pickle**: the legacy serializer. Fast and supports almost all types. Part of the standard library. Has security problems, so it's better to avoid using it. +* **cloudpickle**: See https://pypi.python.org/pypi/cloudpickle It is similar to pickle serializer, but more capable. Extends python's 'pickle' module + for serializing and de-serializing python objects to the majority of the built-in python types. + Has security problems though, just as pickle. * **dill**: See https://pypi.python.org/pypi/dill It is similar to pickle serializer, but more capable. Extends python's 'pickle' module for serializing and de-serializing python objects to the majority of the built-in python types. Has security problems though, just as pickle. @@ -177,6 +181,7 @@ You select the serializer to be used by setting the ``SERIALIZER`` config item. The valid choices are the names of the serializer from the list mentioned above. If you're using pickle or dill, and need to control the protocol version that is used, you can do so with the ``PICKLE_PROTOCOL_VERSION`` or ``DILL_PROTOCOL_VERSION`` config items. +If you're using cloudpickle, you can control the protocol version with ``PICKLE_PROTOCOL_VERSION`` as well. By default Pyro will use the highest one available. It is possible to override the serializer on a particular proxy. This allows you to connect to one server @@ -192,15 +197,15 @@ serializer, for instance. Set the desired serializer name in ``proxy._pyroSerial .. note:: The serializer(s) that a Pyro server/daemon accepts, is controlled by a different config item (``SERIALIZERS_ACCEPTED``). This can be a set of one or more serializers. - By default it accepts the set of 'safe' serializers, so "``pickle``" and "``dill``" are excluded. - If the server doesn't accept the serializer that you configured + By default it accepts the set of 'safe' serializers, so "``pickle``", "``cloudpickle``" + and "``dill``" are excluded. If the server doesn't accept the serializer that you configured for your client, it will refuse the requests and respond with an exception that tells you about the unsupported serializer choice. If it *does* accept your requests, the server response will use the same serializer that was used for the request. .. note:: Because the name server is just a regular Pyro server as well, you will have to tell - it to allow the pickle or dill serializers if your client code uses them. + it to allow the pickle, cloudpickle or dill serializers if your client code uses them. See :ref:`nameserver-pickle`. @@ -212,7 +217,7 @@ Changing the way your custom classes are (de)serialized ------------------------------------------------------- .. note:: - The information in this paragraph is not relevant when using the pickle or dill serialization protocols, + The information in this paragraph is not relevant when using the pickle, cloudpickle or dill serialization protocols, they have their own ways of serializing custom classes. By default, custom classes are serialized into a dict. @@ -356,7 +361,7 @@ The signature of the batch proxy call is as follows: Invoke the batch and when done, returns a generator that produces the results of every call, in order. If ``oneway==True``, perform the whole batch as one-way calls, and return ``None`` immediately. If ``async==True``, perform the batch asynchronously, and return an asynchronous call result object immediately. - + **Simple example**:: batch = Pyro4.batch(proxy) diff --git a/docs/source/config.rst b/docs/source/config.rst index 5b43450f..e9822811 100644 --- a/docs/source/config.rst +++ b/docs/source/config.rst @@ -101,7 +101,7 @@ PREFER_IP_VERSION int 4 The IP address type th THREADPOOL_SIZE int 40 For the thread pool server: maximum number of threads running THREADPOOL_SIZE_MIN int 4 For the thread pool server: minimum number of threads running FLAME_ENABLED bool False Should Pyro Flame be enabled on the server -SERIALIZER str serpent The wire protocol serializer to use for clients/proxies (one of: serpent, json, marshal, msgpack, pickle, dill) +SERIALIZER str serpent The wire protocol serializer to use for clients/proxies (one of: serpent, json, marshal, msgpack, pickle, cloudpickle, dill) SERIALIZERS_ACCEPTED set json,marshal,serpent The wire protocol serializers accepted in the server/daemon. In your code it should be a set of strings, use a comma separated string instead when setting the shell environment variable. PICKLE_PROTOCOL_VERSION int highest possible The pickle protocol version to use, if pickle is selected as serializer. Defaults to pickle.HIGHEST_PROTOCOL diff --git a/docs/source/install.rst b/docs/source/install.rst index d321d6f3..9849f8ce 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -18,8 +18,8 @@ It will probably not work with Jython 2.7 at this time of writing. If you need t .. note:: - When Pyro is configured to use pickle, dill or marshal as its serialization format, it is required to have the same *major* Python versions - on your clients and your servers. Otherwise the different parties cannot decipher each others serialized data. + When Pyro is configured to use pickle, cloudpickle, dill or marshal as its serialization format, it is required to have the same + *major* Python versions on your clients and your servers. Otherwise the different parties cannot decipher each others serialized data. This means you cannot let Python 2.x talk to Python 3.x with Pyro, when using those serializers. However it should be fine to have Python 3.3 talk to Python 3.4 for instance. The other protocols (serpent, json) don't have this limitation! @@ -76,4 +76,4 @@ It contains: and a couple of other files: a setup script and other miscellaneous files such as the license (see :doc:`license`). -If you don't want to download anything, you can view all of this `online on Github `_. \ No newline at end of file +If you don't want to download anything, you can view all of this `online on Github `_. diff --git a/docs/source/intro.rst b/docs/source/intro.rst index 0640eabe..c69795ae 100644 --- a/docs/source/intro.rst +++ b/docs/source/intro.rst @@ -27,8 +27,8 @@ Here's a quick overview of Pyro's features: - works between different system architectures and operating systems. - able to communicate between different Python versions transparently. - defaults to a safe serializer (`serpent `_) that supports many Python data types. -- supports different serializers (serpent, json, marshal, msgpack, pickle, dill). -- support for all Python data types that are serializable when using the 'pickle' or 'dill' serializers [1]_. +- supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill). +- support for all Python data types that are serializable when using the 'pickle', 'cloudpickle' or 'dill' serializers [1]_. - can use IPv4, IPv6 and Unix domain sockets. - optional secure connections via SSL/TLS (encryption, authentication and integrity), including certificate validation on both ends (2-way ssl). - lightweight client library available for .NET and Java native code ('Pyrolite', provided separately). @@ -93,7 +93,7 @@ Remote controlling resources or other programs is a nice application as well. For instance, you could write a simple remote controller for your media server that is running on a machine somewhere in a closet. A simple remote control client program could be used to instruct the media server -to play music, switch playlists, etc. +to play music, switch playlists, etc. Another example is the use of Pyro to implement a form of `privilege separation `_. There is a small component running with higher privileges, but just able to execute the few tasks (and nothing else) @@ -257,9 +257,9 @@ Experiment with the ``benchmark``, ``batchedcalls`` and ``hugetransfer`` example .. rubric:: Footnotes -.. [1] When configured to use the :py:mod:`pickle` or :py:mod:`dill` serializer, +.. [1] When configured to use the :py:mod:`pickle`, :py:mod:`cloudpickle` or :py:mod:`dill` serializer, your system may be vulnerable - because of the security risks of the pickle and dill protocols (possibility of arbitrary + because of the security risks of these serialization protocols (possibility of arbitrary code execution). Pyro does have some security measures in place to mitigate this risk somewhat. They are described in the :doc:`security` chapter. It is strongly advised to read it. diff --git a/docs/source/nameserver.rst b/docs/source/nameserver.rst index 2b03072c..58fefe75 100644 --- a/docs/source/nameserver.rst +++ b/docs/source/nameserver.rst @@ -520,16 +520,17 @@ You can control its behavior by setting certain Pyro config items before startin .. index:: double: name server; pickle + double: name server; cloudpickle double: name server; dill .. _nameserver-pickle: -Using the name server with pickle or dill serializers -===================================================== -If you find yourself in the unfortunate situation where you absolutely have to use the pickle +Using the name server with pickle, cloudpickle or dill serializers +================================================================== +If you find yourself in the unfortunate situation where you absolutely have to use the pickle, cloudpickle or dill serializers, you have to pay attention when also using the name server. -Because pickle and dill are disabled by default, the name server will not reply to messages from clients -that are using those serializers, unless you enable them in the name server as well. +Because these serializers are disabled by default, the name server will not reply to messages from clients +that are using them, unless you enable them in the name server as well. The symptoms are usually that your client code seems unable to contact the name server:: @@ -546,15 +547,15 @@ And if you enable logging for the name server you will likely see in its logfile ... Pyro4.errors.ProtocolError: message used serializer that is not accepted: [4,5] -The way to solve this is to stop using the pickle and dill serializers, or if you must use them, +The way to solve this is to stop using the these serializers, or if you must use them, tell the name server that it is okay to accept them. You do that by -setting the ``SERIALIZERS_ACCEPTED`` config item to a set of serializers that includes pickle or dill, +setting the ``SERIALIZERS_ACCEPTED`` config item to a set of serializers that includes them, and then restart the name server. For instance:: - $ export PYRO_SERIALIZERS_ACCEPTED=serpent,json,marshal,pickle,dill + $ export PYRO_SERIALIZERS_ACCEPTED=serpent,json,marshal,pickle,cloudpickle,dill $ pyro4-ns -If you enable logging you will then see that the name server says that pickle and dill are among +If you enable logging you will then see that the name server says that pickle, cloudpickle and dill are among the accepted serializers. diff --git a/docs/source/security.rst b/docs/source/security.rst index 805f9961..65006ba1 100644 --- a/docs/source/security.rst +++ b/docs/source/security.rst @@ -16,14 +16,15 @@ Security .. index:: double: security; pickle + double: security; cloudpickle double: security; dill -Pickle and dill as serialization formats (optional) -=================================================== -When configured to do so, Pyro is able to use the :py:mod:`pickle` module or the -:py:mod:`dill` module to serialize objects and then sends them over the network. -It is well known that using pickle or dill for this purpose is a security risk. -The main problem is that allowing a program to unpickle or undill arbitrary data +Pickle, cloudpickle and dill as serialization formats (optional) +================================================================ +When configured to do so, Pyro is able to use the :py:mod:`pickle`, :py:mod:`cloudpickle` +or :py:mod:`dill` modules to serialize objects and then sends them over the network. +It is well known that using these serializers for this purpose is a security risk. +The main problem is that allowing a program to deserialize this type of serialized data can cause arbitrary code execution and this may wreck or compromise your system. Because of this the default serializer is serpent, which doesn't have this security problem. Some other means to enhance security are discussed below. diff --git a/docs/source/tipstricks.rst b/docs/source/tipstricks.rst index 6aed9f04..4aad83fa 100644 --- a/docs/source/tipstricks.rst +++ b/docs/source/tipstricks.rst @@ -142,16 +142,16 @@ The success ratio of all this depends heavily on your network setup. .. index:: same Python version -Same major Python version required when using pickle, dill or marshal -===================================================================== +Same major Python version required when using pickle, cloudpickle, dill or marshal +================================================================================== -When Pyro is configured to use pickle, dill or marshal as its serialization format, it is required to have the same *major* Python versions +When Pyro is configured to use pickle, cloudpickle, dill or marshal as its serialization format, it is required to have the same *major* Python versions on your clients and your servers. Otherwise the different parties cannot decipher each others serialized data. -This means you cannot let Python 2.x talk to Python 3.x with Pyro when using pickle, dill or marshal as serialization protocols. However +This means you cannot let Python 2.x talk to Python 3.x with Pyro when using these serializers. However it should be fine to have Python 3.3 talk to Python 3.4 for instance. It may still be required to specify the pickle or dill protocol version though, because that needs to be the same on both ends as well. For instance, Python 3.4 introduced version 4 of the pickle protocol and as such won't be able to talk to Python 3.3 which is stuck -on version 3 pickle protocol. You'll have to tell the Python 3.4 side to step down to protocol 3. There is a config item for that. The same will apply for dill protocol versions. +on version 3 pickle protocol. You'll have to tell the Python 3.4 side to step down to protocol 3. There is a config item for that. The same will apply for dill protocol versions. If you are using cloudpickle, you can just set the pickle protocol version (as pickle is used under the hood). The implementation independent serialization protocols serpent and json don't have these limitations. @@ -488,8 +488,9 @@ So if you want to use them with Pyro, and pass them over the wire, you'll have t ``list(na)`` doesn't work: it seems to return a regular python list but the elements are still numpy datatypes. You have to use the full conversions as mentioned earlier. #. Don't return arrays at all. Redesign your API so that you might perhaps only return a single element from it. -#. Tell Pyro to use :py:mod:`pickle` or :py:mod:`dill` as serializer. Pickle and Dill can deal with numpy datatypes. However they have security implications. - See :doc:`security`. If you choose to use pickle or dill anyway, also be aware that you must tell your name server +#. Tell Pyro to use :py:mod:`pickle`, :py:mod:`cloudpickle` or :py:mod:`dill` as serializer. These serializers + can deal with numpy datatypes. However they have security implications. + See :doc:`security`. If you choose to use them anyway, also be aware that you must tell your name server about it as well, see :ref:`nameserver-pickle`. diff --git a/src/Pyro4/util.py b/src/Pyro4/util.py index 1a09832c..1055eaa3 100644 --- a/src/Pyro4/util.py +++ b/src/Pyro4/util.py @@ -356,6 +356,8 @@ def dict_to_class(cls, data): return JsonSerializer() elif classname == "Pyro4.util.MsgpackSerializer": return MsgpackSerializer() + elif classname == "Pyro4.util.CloudpickleSerializer": + return CloudpickleSerializer() elif classname == "Pyro4.util.DillSerializer": return DillSerializer() elif classname.startswith("Pyro4.errors."): @@ -459,6 +461,36 @@ def copyreg_function(obj): pass +class CloudpickleSerializer(SerializerBase): + """ + A (de)serializer that wraps the Cloudpickle serialization protocol. + It can optionally compress the serialized data, and is thread safe. + """ + serializer_id = 7 # never change this + + def dumpsCall(self, obj, method, vargs, kwargs): + return cloudpickle.dumps((obj, method, vargs, kwargs), config.PICKLE_PROTOCOL_VERSION) + + def dumps(self, data): + return cloudpickle.dumps(data, config.PICKLE_PROTOCOL_VERSION) + + def loadsCall(self, data): + return cloudpickle.loads(data) + + def loads(self, data): + return cloudpickle.loads(data) + + @classmethod + def register_type_replacement(cls, object_type, replacement_function): + def copyreg_function(obj): + return replacement_function(obj).__reduce__() + + try: + copyreg.pickle(object_type, copyreg_function) + except TypeError: + pass + + class DillSerializer(SerializerBase): """ A (de)serializer that wraps the Dill serialization protocol. @@ -720,6 +752,13 @@ def get_serializer_by_id(sid): _ser = MarshalSerializer() _serializers["marshal"] = _ser _serializers_by_id[_ser.serializer_id] = _ser +try: + import cloudpickle + _ser = CloudpickleSerializer() + _serializers["cloudpickle"] = _ser + _serializers_by_id[_ser.serializer_id] = _ser +except ImportError: + pass try: import dill _ser = DillSerializer() diff --git a/test_requirements.txt b/test_requirements.txt index 8be50abd..77eeac40 100644 --- a/test_requirements.txt +++ b/test_requirements.txt @@ -1,3 +1,4 @@ -r requirements.txt +cloudpickle>=0.4.0 dill>=0.2.6 msgpack-python>=0.4.6 diff --git a/tests/PyroTests/test_daemon.py b/tests/PyroTests/test_daemon.py index 3a919012..670a1ca2 100644 --- a/tests/PyroTests/test_daemon.py +++ b/tests/PyroTests/test_daemon.py @@ -62,6 +62,7 @@ def testSerializerConfig(self): def testSerializerAccepted(self): self.assertIn("marshal", config.SERIALIZERS_ACCEPTED) self.assertNotIn("pickle", config.SERIALIZERS_ACCEPTED) + self.assertNotIn("cloudpickle", config.SERIALIZERS_ACCEPTED) self.assertNotIn("dill", config.SERIALIZERS_ACCEPTED) with Pyro4.core.Daemon(port=0) as d: msg = Pyro4.message.Message(Pyro4.message.MSG_INVOKE, b"", Pyro4.util.MarshalSerializer.serializer_id, 0, 0, hmac_key=d._pyroHmacKey) @@ -75,6 +76,13 @@ def testSerializerAccepted(self): except Pyro4.errors.ProtocolError as x: self.assertIn("serializer that is not accepted", str(x)) pass + msg = Pyro4.message.Message(Pyro4.message.MSG_INVOKE, b"", Pyro4.util.CloudpickleSerializer.serializer_id, 0, 0, hmac_key=d._pyroHmacKey) + cm = ConnectionMock(msg) + try: + d.handleRequest(cm) + self.fail("should crash") + except Pyro4.errors.ProtocolError as x: + self.assertTrue("no serializer available for id" in str(x) or "serializer that is not accepted" in str(x)) msg = Pyro4.message.Message(Pyro4.message.MSG_INVOKE, b"", Pyro4.util.DillSerializer.serializer_id, 0, 0, hmac_key=d._pyroHmacKey) cm = ConnectionMock(msg) try: diff --git a/tests/PyroTests/test_serialize.py b/tests/PyroTests/test_serialize.py index 7c7b0a77..2af0c949 100644 --- a/tests/PyroTests/test_serialize.py +++ b/tests/PyroTests/test_serialize.py @@ -157,8 +157,8 @@ def testSerDaemonHack(self): obj.name = "hello" daemon.register(obj) o, _ = self.ser.serializeData(obj) - if self.SERIALIZER in ("pickle", "dill"): - # only pickle and dill can deserialize the PrettyPrinter class without the need of explicit deserialization function + if self.SERIALIZER in ("pickle", "cloudpickle", "dill"): + # only pickle, cloudpickle and dill can deserialize the PrettyPrinter class without the need of explicit deserialization function o2 = self.ser.deserializeData(o) self.assertEqual("hello", o2.name) self.assertEqual(42, o2._width) @@ -316,8 +316,8 @@ def testAutoProxyFullExposed(self): self.assertEqual({'oneway'}, p1._pyroOneway) def testCustomClassFail(self): - if self.SERIALIZER in ("pickle", "dill"): - self.skipTest("pickle and dill simply serialize custom classes") + if self.SERIALIZER in ("pickle", "cloudpickle", "dill"): + self.skipTest("pickle, cloudpickle and dill simply serialize custom classes") o = pprint.PrettyPrinter(stream="dummy", width=42) s, c = self.ser.serializeData(o) try: @@ -327,8 +327,8 @@ def testCustomClassFail(self): pass def testCustomClassOk(self): - if self.SERIALIZER in ("pickle", "dill"): - self.skipTest("pickle and dill simply serialize custom classes just fine") + if self.SERIALIZER in ("pickle", "cloudpickle", "dill"): + self.skipTest("pickle, cloudpickle and dill simply serialize custom classes just fine") o = MyThingPartlyExposed("test") Pyro4.util.SerializerBase.register_class_to_dict(MyThingPartlyExposed, mything_dict) Pyro4.util.SerializerBase.register_dict_to_class("CUSTOM-Mythingymabob", mything_creator) @@ -567,6 +567,27 @@ def testSourceByteTypes_loads_memoryview(self): self.assertEqual([4, 5, 6], d) +class SerializeTests_cloudpickle(SerializeTests_pickle): + SERIALIZER = "cloudpickle" + + @unittest.skip('not implemented') + def testUriSerializationWithoutSlots(self): + pass + + def testSerializeLambda(self): + l = lambda x: x * x + ser, compressed = self.ser.serializeData(l) + l2 = self.ser.deserializeData(ser, compressed=compressed) + self.assertEqual(l2(3.), 9.) + + def testSerializeLocalFunction(self): + def f(x): + return x * x + ser, compressed = self.ser.serializeData(f) + f2 = self.ser.deserializeData(ser, compressed=compressed) + self.assertEqual(f2(3.), 9.) + + is_ironpython_without_dill = False try: import dill @@ -720,6 +741,11 @@ def testSerializersAvailable(self): Pyro4.util.get_serializer("serpent") except ImportError: pass + try: + import cloudpickle + Pyro4.util.get_serializer("cloudpickle") + except ImportError: + pass try: import dill Pyro4.util.get_serializer("dill") @@ -733,15 +759,16 @@ def testAssignedSerializerIds(self): self.assertEqual(4, Pyro4.util.PickleSerializer.serializer_id) self.assertEqual(5, Pyro4.util.DillSerializer.serializer_id) self.assertEqual(6, Pyro4.util.MsgpackSerializer.serializer_id) + self.assertEqual(7, Pyro4.util.CloudpickleSerializer.serializer_id) def testSerializersAvailableById(self): Pyro4.util.get_serializer_by_id(1) # serpent Pyro4.util.get_serializer_by_id(2) # json Pyro4.util.get_serializer_by_id(3) # marshal Pyro4.util.get_serializer_by_id(4) # pickle - # ids 5 and 6 (dill, msgpack) are not always available, so we skip those. + # ids 5, 6 and 7 (dill, msgpack, cloudpickle) are not always available, so we skip those. self.assertRaises(Pyro4.errors.SerializeError, lambda: Pyro4.util.get_serializer_by_id(0)) - self.assertRaises(Pyro4.errors.SerializeError, lambda: Pyro4.util.get_serializer_by_id(7)) + self.assertRaises(Pyro4.errors.SerializeError, lambda: Pyro4.util.get_serializer_by_id(8)) def testDictClassFail(self): o = pprint.PrettyPrinter(stream="dummy", width=42)