-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Feature description
Typically pickling in Python creates a large bytes object with types, functions, and data all packed in to allow easy reconstruction later. Originally pickling was focused on reading/writing to disk. However these days it is increasingly using as a serialization protocol for objects on the wire. In this case the copies of data required to put everything in a single bytes object hurts performance and doesn't offer much (as the data could be shipped along in separate buffers without copying).
For these reasons, Python added support for out-of-band buffers in pickle, which allows the user to flag buffers of data for pickle to extract and send alongside the typical bytes object (thus avoiding unneeded copying of data). This was submitted and accepted as PEP 574 and is part of Python 3.8 (along with a backport package for Python 3.5, 3.6, and 3.7). On the implementation side this just comes down to implementing __reduce_ex__ instead of __reduce__ (basically the same with a protocol version argument) and placing any bytes-like data (like NumPy arrays and memoryviews) into PickleBuffer objects. For older pickle protocols this step can simply be skipped. Here's an example. The rest is on libraries using protocol 5 (like Dask) to implement and use.
Could the feature be a custom component or spaCy plugin?
If so, we will tag it as project idea so other users can take it on.
I don't think so as this relies on changing the pickle implementations of spaCy objects. Though I could be wrong :)