Write pickle to file-like without intermediate in-memory buffer#37056
Write pickle to file-like without intermediate in-memory buffer#37056jreback merged 9 commits intopandas-dev:masterfrom
Conversation
Before this change, calling pickle.dumps() created an in-memory byte buffer, negating the advantage of zero-copy pickle protocol 5. After this change, pickle.dump writes directly to open file(-like), cutting peak memory in half in most cases.
jreback
left a comment
There was a problem hiding this comment.
nice, can you add a memory asv and show the results of this & a whatsnew note for 1.2 perf section.
|
also to actually close this issue i think would be good to add some tests that explicity set the pickle protocal to 5 |
Isn't that covered by py38 tests where pickle.HIGHEST_PROTOCOL (==5) is used by default? |
probably but let's make it explict as well (e.g. parameterize on 4,5,highest when PY38 or greater) |
|
Added ASV peakmem_ benchmark. The benchmark currently uses an ~11MB dataframe - I have no experience with ASV to tell how reliable memory benchmarking is for small-ish objects, any advice appreciated. |
it should show a diff here in any event. what kind of results are you seeing? |
Yep, added results above - |
jreback
left a comment
There was a problem hiding this comment.
lgtm ping on green. can you post the results of the asv (in the top of the PR, just edit it)
|
lgtm @ig248 just resolved the conflict and can merge on green. |
|
thanks @ig248 very nice! |
Before this change, calling
pickle.dumps()created an in-memory byte buffer, negating the advantageof zero-copy pickle protocol 5. After this change,
pickle.dumpwrites directly to open file(-like),cutting peak memory in half in most cases.
Profiling was done with pandas@master and python 3.8.5
Related issues:
Update: ASV results
$ asv continuous -f 1.1 origin/master HEAD -b pickleyields:black pandasgit diff upstream/master -u -- "*.py" | flake8 --diff