-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the dictionary hash to the HTTP response #2680
Conversation
Adds a new "Content-Dictionary" with the hash of the dictionary used when encoding the HTTP response. For #httpwg#2636
@martinthomson this adds the dictionary hash to the HTTP response when used. Right now it is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this field carries binary data, why aren't you using structured fields?
Mostly for developer ergonomics to make it easier to deploy. In the case of static resources using previous versions as dictionaries (v1 to v2 of main.js), one method for enabling that is to compress the new version using the old version on the cli as part of the push to production and store the dictionary-compressed artifact as a file on the server next to the full file. Then at serving time, compare the value of the Structured fields encode the binary data as base-64 encoded strings which aren't filesystem-friendly so there would need to be some steps at serving time to decode the header, convert it to ascii and then check. It's not insurmountable but it's a bit more processing and a straight string match could be handled directly with templating in server config. Lowercase hex was picked since that's what all of the cli tools for generating hashes emit. |
Is |
I don't think that authoring convenience outweighs the advantages of having a unified grammar here. It's not like base64 is some esoteric function. If you are talking about sha256sum, here's a toy: https://github.com/martinthomson/shabase64 |
I agree with @martinthomson - don't. To make the sf value filesystem friendly, you just need to strip a few characters and remap "/". |
Keep in mind that hashes are not hex strings. A SHA-256 hash is 32 bytes of binary data. The hex thing is just what some tools happen to output. But anything consuming this programmatically to, e.g., compare the hash against something would be best served by the underlying bytes. Using base64 is also slightly more compact on the wire. And if binary structured fields gets off the ground, it can be even more compact. |
My main concern is trading off a unified grammer of using sf-binary in the headers vs consistency with tooling and how the dictionaries are represented once it gets out of the protocol. In the dynamic case, this could be something like the web server checking to see if the requested dictionary is available (on disk) in response to a request. I don't know how each web server will choose to build that but having files named with the hash is an easy common starting point. Otherwise each chooses what the ascii representation of the 32-byte hash should be for their case. On the static case, I'm not so much worried about not having a tool that can create base64-encoded strings of sha256 hashes (it's trivial as you pointed out). I'm more concerned about the transformations that need to happen at serving time to decode the hash, convert it into an ascii representation and then check if that variant of the encoded file is available. It's certainly all doable, but it requires base64decode and binhex (or some equivalents) to be available in the layer where the checks are being done - or to rely on application code. Specifically, I was thinking that it would be easy enough in VCL or an existing Nginx or Apache config to do something like:
Doing similar processing with base64-encoded hashes has a step to convert it into an unspecified format that is platform-specific and may not be doable without either running application code or updating the server with base64decode + binhex support. It likely doesn't matter for the case of applications and servers that are dictionary-aware and can operate on the raw hashes (and enjoy the ~20 byte request header savings) but it is a tradeoff that pushes complexity further up the stack so I just want to make sure that is the tradeoff that we want to make. |
Having a finally consistent grammar for HTTP is the precursor to get consistent tooling. It may be more convenient to do something else in the short term, but in the long term we'd just be adding to the mess of inconsistency in HTTP. In the long term, I would hope that these config language DSLs would be able to process structured fields natively. (Although, it may be on this WG to demonstrate this being possible, since the data model for structured fields is slightly interesting...) I also don't think we should be encouraging, much less designing for, folks trying to process HTTP headers with regexps. There's a well-defined grammar and inventing other parsers is a great way to not quite get things right. We already have a pretty hard time with HTTP extensibility due to folks parsing it wrong. The path-unfriendliness is a little unfortunate. Possibly we should have picked the URL-safe version of base64, but too late for that. But since byte strings and text strings aren't the same type, I imagine any sf-binary-aware config language DSL would also have hex and base64 functions available. Indeed taking attacker-supplied input and mapping it to a path component, without running it through some very restrictive encoding, seems a pretty bad idea! At the end of the day, SHA-256 hashes are byte strings, and sf-binary is the way HTTP decided to say byte string. We could say that, although SHA-256 hashes are byte strings, dictionary names are ASCII strings computed by hex(sha256). Then these headers should be sf-string (with quotes). But that would be less efficient on the wire, and our mandate on the web is to put user needs (bandwidth, in this case) first, so I think this is the wrong tradeoff. That said, I see |
I switched |
The latest spec discussed in IETF HTTPWG is using `Available-Dictionary` request header name instead of `Sec-Available-Dictionary`. Also in the value is using a Structured Field sf-binary sha256 hash of the dictionary after [1]. So after this CL, when V2 backend is enabled, Chrome will set `Available-Dictionary` header with a Structured Field sf-binary sha256 hash of the dictionary. [1]: httpwg/http-extensions#2680 Bug: 1413922 Change-Id: Ibb716ec27bb8f5d58a5266ad6ac3ec75ad643130 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5224886 Reviewed-by: Kenichi Ishibashi <[email protected]> Commit-Queue: Tsuyoshi Horo <[email protected]> Cr-Commit-Position: refs/heads/main@{#1251259}
A new "Content-Dictionary" response header was introduced by this spec change [1]. When V2 backend is enabled, if the "Content-Encoding" response header indicates that a dictionary is used, Chrome will check the "Content-Dictionary" response header. And if there is no such header or it doesn't match with the value of "Available-Dictionary" request header, Chrome will treat the response as an error. [1]: httpwg/http-extensions#2680 Bug: 1413922 Change-Id: I37d09fad1201f4fb4b059d65c4aaee58cec788ac Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5226641 Reviewed-by: Kenichi Ishibashi <[email protected]> Commit-Queue: Tsuyoshi Horo <[email protected]> Cr-Commit-Position: refs/heads/main@{#1251341}
I think this change adds significant complexity, and I'm not sure regarding its benefits. I very recently played around with prototyping a compression dictionaries deployment and was very happy with the simplicity of it:
That can be done today in almost any deployed routing layer. Once this change lands in implementations, a similar deployment would require me to add custom logic to transform that header value to something that is file system friendly. To take just one example - if I were to implement this as a Cloudflare transform rule, I'm not even sure it's feasible to do that. (although it might be, with a creative Can you elaborate on the advantages of moving to base64, beyond theoretical purity? |
Adds a new "Content-Dictionary" with the hash of the dictionary used when encoding the HTTP response.
For #2636