-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed hash of raw dictionary in compressed resource (optionally) #4023
Comments
Currently, Zstandard supports 2 modes for dictionaries :
The current format of Zstandard has been frozen in RFC8878, so if we want to remain within the boundaries of what has been specified, these are pretty much the only options. Now, introducing format-breaking novelties is not impossible, but it will come at a cost: existing (already deployed) Zstandard decoders will be incompatible with these changes. So this is an option we want to be careful about, and trigger only for a very good reason. Regarding the described request to transmit a hash of the dictionary to compare against, there is an existing work around that might help here: the skippable frame. The advantage is that the application is fully in charge, so it can make the choices it wants, and change them, without having to coordinate with A skippable frame is fairly light weight, it introduces a cost of only 8 bytes, for the magic header and the content size. On the other hand, if we were willing to push that logic inside
So, with these trade-offs in mind, a method based on skippable frames to transport the information feels like a reasonable option to consider. |
Thanks. Without a tagging mechanism for the skippable frames (and a registry for ID's of some kind) I don't think we want to be adding them to all of the dictionary-compressed streams served on the web. A web-specific container (header) in front of the zstd file format might work for transport but the raw resources wouldn't be usable by the cli tools. Sounds like an out-of-band negotiation is the best we can hope for for now and just ask that you keep it in mind for any future revisions to the file format (if there end up being any). |
Yeah. Although note that the skippable frame magic has a range of 16 values. If we were going to pursue this, we could probably reserve one of those values for this purpose. |
If we went that route, we would probably need a combination of reserving one of the magic's as well as a signature header on the hash itself in case the same magic was also used by someone else for watermarking, etc. |
It would be useful if, when using a raw dictionary for compression, the compressor could embed a hash of the dictionary that was used during compression. Then, at decompression time, if the hash is present and doesn't match the provided dictionary the decompression would fail. It could also be used to identify which dictionary in a set of dictionaries was used at decompression time.
For the HTTP content-encoding we are currently passing the dictionary hash as an additional response header but that depends on the header and the resource always being together. It would be cleaner if the payload itself could contain the dictionary information.
See httpwg/http-extensions#2770
The text was updated successfully, but these errors were encountered: