Support out-of-band buffers in Python pickling#5132
Support out-of-band buffers in Python pickling#5132kkraus14 merged 4 commits intorapidsai:branch-0.14from
Conversation
This lets us get access to the `protocol` argument, which can be useful if we want to take advantage of newer pickling protocols.
In Pickle's protocol 5, out-of-band buffers are supported, which avoids unnecessary copies when serializing data. In other words, this similar to Dask's custom serialization except for pickling and can be supported by any library that can use this Python standard. The only requirement is we wrap any bytes-like objects in `PickleBuffer`s in our `__reduce_ex__` method, which is what we do here. If an older Pickle protocol is in use, we simply skip this path and go about pickling the NumPy array as we would have otherwise.
|
Are we considering host buffers with pack/unpack? |
I think we'd want to have |
|
Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest. |
Codecov Report
@@ Coverage Diff @@
## branch-0.14 #5132 +/- ##
===============================================
- Coverage 88.47% 88.44% -0.04%
===============================================
Files 54 55 +1
Lines 10276 10405 +129
===============================================
+ Hits 9092 9203 +111
- Misses 1184 1202 +18
Continue to review full report at Codecov.
|
Went ahead and placed this in PR ( #5139 ) for discussion. |
|
rerun tests |
When Python pickle's protocol 5 or greater is used, this change will support more efficient serialization of out-of-band buffers. This is analogous to Dask's custom serialization except for pickling. As such this is helpful in any Python serialization case where pickling is used. If an older pickling protocol is used, we simply proceed as before.