Skip to content

Supporting out-of-band buffers with pickle protocol 5 #5472

@jakirkham

Description

@jakirkham

Feature description

Typically pickling in Python creates a large bytes object with types, functions, and data all packed in to allow easy reconstruction later. Originally pickling was focused on reading/writing to disk. However these days it is increasingly using as a serialization protocol for objects on the wire. In this case the copies of data required to put everything in a single bytes object hurts performance and doesn't offer much (as the data could be shipped along in separate buffers without copying).

For these reasons, Python added support for out-of-band buffers in pickle, which allows the user to flag buffers of data for pickle to extract and send alongside the typical bytes object (thus avoiding unneeded copying of data). This was submitted and accepted as PEP 574 and is part of Python 3.8 (along with a backport package for Python 3.5, 3.6, and 3.7). On the implementation side this just comes down to implementing __reduce_ex__ instead of __reduce__ (basically the same with a protocol version argument) and placing any bytes-like data (like NumPy arrays and memoryviews) into PickleBuffer objects. For older pickle protocols this step can simply be skipped. Here's an example. The rest is on libraries using protocol 5 (like Dask) to implement and use.

Could the feature be a custom component or spaCy plugin?

If so, we will tag it as project idea so other users can take it on.


I don't think so as this relies on changing the pickle implementations of spaCy objects. Though I could be wrong :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests and improvementsfeat / serializeFeature: Serialization, saving and loadinghelp wantedContributions welcome!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions