-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate dictionary references to enable dictionary usage for any combination of window size and content size #4017
Comments
Yes this is correct. A word of context: dictionary compression was originally designed to serve Small Data compression. For this use case, the "window" size is not a constraint. But when the dictionary functionality is (ab)used as a delta engine, now targeting very large contents and references, this Window Size can become a concern, because it typically means it must be as large as the content to decompress, which could be a lot.
In theory, this is possible. The real price to pay for this new capability is incompatibility with existing decoders. And that's a pretty steep one. Now that we have an idea of the cost (and confusion is a pretty high cost, paid at ecosystem scale), what would be the benefits? Let's get back to the example stated in httpwg/http-extensions#2754 : Because :
the memory cost starts at ~50 MB, just for the reference. Now, let's imagine, we implement a "trailing dictionary" as mentioned above. Now we don't need the entire decompressed content in memory, just the window size (typically between 2 and 8 MB, so let's take 4 MB for the sake of argument). Thanks to these savings, we now "only" need ~54 MB. It looks to me that, as long as the mechanism requires by construction the entire reference in memory, we are already paying a pretty big memory budget. And then the rest is essentially within a factor 2, which is sure not nothing, but not transformative either. The situation would be wildly different if the memory saving was more significant, say for example > 10x smaller. I could definitely imagine an algorithm that would offer this property. Such a design would offer a dramatically lower memory budget for a delta engine, the difference is probably large enough to deserve some attention. |
(Couldn't find an existing issue covering this, but happy to redirect conversation elsewhere.)
My understanding based on @Cyan4973's comment in httpwg/http-extensions#2754 is that Zstd stops being able to use a dictionary once we're more than a single window size "away" from the beginning of the content. In other words, the dictionary is almost "prepended" to the content for the purposes of being reference-able.
It seems like this entangles the ability to support backreferences based on some window size with the ability to use a dictionary at all when the window size is less than the size of the content itself.
In a browser context (and on mobile devices), we cannot use significantly large window sizes due to memory usage constraints.
Today, my understanding is that this means we will not benefit from dictionary usage beyond the first window-size number of bytes in the content.
Would it be possible to have a separate mechanism for referring to dictionary content that wasn't dependent on the window size? We're essentially already committing to having the dictionary in memory, and need a way to limit the maximum size of those dictionaries, so it seems quite unfortunate to not be able to benefit from having spent those resources to improve [de]compression.
The text was updated successfully, but these errors were encountered: