Skip to content

Support out-of-band buffers in Python pickling#5132

Merged
kkraus14 merged 4 commits intorapidsai:branch-0.14from
jakirkham:support_pickle_protocol_5
May 9, 2020
Merged

Support out-of-band buffers in Python pickling#5132
kkraus14 merged 4 commits intorapidsai:branch-0.14from
jakirkham:support_pickle_protocol_5

Conversation

@jakirkham
Copy link
Member

When Python pickle's protocol 5 or greater is used, this change will support more efficient serialization of out-of-band buffers. This is analogous to Dask's custom serialization except for pickling. As such this is helpful in any Python serialization case where pickling is used. If an older pickling protocol is used, we simply proceed as before.

jakirkham added 4 commits May 7, 2020 13:31
This lets us get access to the `protocol` argument, which can be useful
if we want to take advantage of newer pickling protocols.
In Pickle's protocol 5, out-of-band buffers are supported, which avoids
unnecessary copies when serializing data. In other words, this similar
to Dask's custom serialization except for pickling and can be supported
by any library that can use this Python standard. The only requirement
is we wrap any bytes-like objects in `PickleBuffer`s in our
`__reduce_ex__` method, which is what we do here. If an older Pickle
protocol is in use, we simply skip this path and go about pickling the
NumPy array as we would have otherwise.
@jakirkham jakirkham requested a review from a team as a code owner May 7, 2020 20:35
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shwina as this may have implications for pack / unpack

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels May 7, 2020
@jakirkham
Copy link
Member Author

Are we considering host buffers with pack/unpack?

@kkraus14
Copy link
Collaborator

kkraus14 commented May 7, 2020

Are we considering host buffers with pack/unpack?

I think we'd want to have pack / unpack in the path for pickling a dataframe efficiently for example.

@jakirkham
Copy link
Member Author

Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest.

@codecov
Copy link

codecov bot commented May 7, 2020

Codecov Report

Merging #5132 into branch-0.14 will decrease coverage by 0.03%.
The diff coverage is 86.86%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.14    #5132      +/-   ##
===============================================
- Coverage        88.47%   88.44%   -0.04%     
===============================================
  Files               54       55       +1     
  Lines            10276    10405     +129     
===============================================
+ Hits              9092     9203     +111     
- Misses            1184     1202      +18     
Impacted Files Coverage Δ
python/cudf/cudf/core/buffer.py 82.35% <80.00%> (-0.58%) ⬇️
python/dask_cudf/dask_cudf/io/opt_parquet.py 85.34% <85.34%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 84.97% <0.00%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17ad431...1e2d39a. Read the comment docs.

@jakirkham
Copy link
Member Author

Sure that makes sense. I have another idea on how we might do that, but it's probably a different PR. Can add a draft PR for us to look at if it's of interest.

Went ahead and placed this in PR ( #5139 ) for discussion.

@kkraus14
Copy link
Collaborator

kkraus14 commented May 8, 2020

rerun tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants