-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pluggable compression serde #407
Conversation
|
||
FLAG_BYTES = 0 | ||
FLAG_PICKLE = 1 << 0 | ||
FLAG_INTEGER = 1 << 1 | ||
FLAG_LONG = 1 << 2 | ||
FLAG_COMPRESSED = 1 << 3 # unused, to main compatibility with python-memcached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any insight into how this flag still maps back to python-memcached. Also would this flag cause any issues since two folks can use the same flag but different compression algorithms?
Also I wonder what compatibility we still have with python-memcached before this change and now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, a way to handle a generally compressed values with different compressors is to introduce a prefix.
serde = CompressedSerde(..., prefix="z")
... and then prepend that prefix on serialization (e.g. value = prefix + "|" + value
). On deserialization, validate and strip the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python-memcached used zlib by default but I don't think at this point there is any chance of compatibility since we are supporting multiple versions of pickle and we wouldn't be able to guarantee versions across the serialization.
I like the idea of adding a prefix on there so we identify which compressor was used. In the end the client is only going to have one (via serde) so its not going to be able to handle decompressing/compressing multiple versions but we could at least raise an error that says "Found keys compressed with a different serializer".
I don't know if we do this today though. For example, we don't track pickle versions so we can select the correct one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to do it properly we'd have to do what we discussed before, which I'm not really a fan of:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not a high priority now. I mainly wanted to explicitly make sure we weren't trying to keep any python-memcached compatibility.
Here is how I chose the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@pytest.fixture(scope="session") | ||
def names(): | ||
names = [] | ||
for _ in range(15): | ||
names.append(fake.name()) | ||
|
||
return names | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def paragraphs(): | ||
paragraphs = [] | ||
for _ in range(15): | ||
paragraphs.append(fake.text()) | ||
|
||
return paragraphs | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def objects(): | ||
objects = [] | ||
for _ in range(15): | ||
objects.append(CustomObject()) | ||
|
||
return objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these could be list comprehensions.
This adds a new class
CompressedSerde
which allows you to use it as an entrypoint to compression. It defaults to usingzlib
and has immediate benefits on the size of messages stored in memcached but it also hascompress
anddecompress
as pluggable points if you want to use a stronger compression algorithm likelz4
.