Skip to content

Conversation

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Jun 9, 2020

This is leftover from when Blosc compression was still being handled with the NumPy serialization code path itself ( #777 ) ( #830 ) ( #832 ) ( #998 ). These days frame splitting already happens as part of serialization more generally before compressing data. Also this cutoff is not relevant for when splitting occurs as that is determined by a limit in the size of buffers Blosc can handle, which frame_split_size already knows. So go ahead and drop this special casing in NumPy serialization and rely on these commonly shared code paths that all serialization passes through.

@jakirkham jakirkham force-pushed the drop_split_frames_numpy branch from 9ac4bc3 to de43c05 Compare June 9, 2020 00:46
@mrocklin
Copy link
Member

mrocklin commented Jun 9, 2020

Looks good to me. Thanks for sifting through this part of the codebase @jakirkham

Linting error:

distributed/protocol/numpy.py:4:1: F401 '.utils.frame_split_size' imported but unused

If memory serves you prefer that others don't push to your branches, so I'll hold off on pushing a flake8 commit.

jakirkham added 2 commits June 8, 2020 19:40
This is leftover from when Blosc compression was still being handled
with the NumPy serialization code path itself. These days frame
splitting already happens as part of serialization more generally before
compressing data. Also this cutoff is not relevant for when splitting
occurs as that is determined by a limit in the size of buffers Blosc can
handle, which `frame_split_size` already knows. So go ahead and drop
this special casing in NumPy serialization and rely on these commonly
shared code paths that all serialization passes through.
Since the frames are not being split in serialization and other common
code paths to all deserialization already make sure frames match their
expected length, there should be no need to merge frames in NumPy
deserialization. So drop this as well.
@jakirkham jakirkham force-pushed the drop_split_frames_numpy branch from de43c05 to 8914042 Compare June 9, 2020 02:42
@jakirkham
Copy link
Member Author

Good catch. Thanks! Fixed in the latest commit.

Also we shouldn't need the merge_frames call in deserialization of NumPy arrays any more for the same reason. So have dropped that as well.

To simplify the code a bit and assert the number of frames is exactly 1,
unpack `frames` into a single `frame` variable.
@quasiben
Copy link
Member

quasiben commented Jun 9, 2020

Thanks @jakirkham for the work and @mrocklin for the review. Merging in

@quasiben quasiben merged commit ecba43c into dask:master Jun 9, 2020
@jakirkham jakirkham deleted the drop_split_frames_numpy branch June 9, 2020 16:47
@jakirkham
Copy link
Member Author

Thanks Ben and Matt! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants