Make query result HTTP response compression configurable#15393
Make query result HTTP response compression configurable#15393shixuan-fan merged 2 commits intoprestodb:masterfrom
Conversation
919e547 to
eee98d4
Compare
eee98d4 to
8731ada
Compare
8731ada to
46d990c
Compare
46d990c to
a6f5f5b
Compare
There was a problem hiding this comment.
Can we instead use compressionEnabled? Disabled would incur one more flip in mind :)
There was a problem hiding this comment.
The phrasing is a little weird, but the mechanism here doesn't actually let you force compression- only disable compression. Same thing with the server side logic. Default negotiation happens inside of jetty around the client Accept-Encoding header and server's chosen mime-type and Content-Encoding headers.
There was a problem hiding this comment.
I see. But we could probably just flip the boolean when it is used? I'm just worried this might be error-prone.
There was a problem hiding this comment.
In general I’m worried about the impression that “enabled” will give. To me, that sounds like the client has ultimate control of the decision when in fact, they do not. The client only as the option to opt-out of compression.
presto-main/src/main/java/com/facebook/presto/server/protocol/QueuedStatementResource.java
Outdated
Show resolved
Hide resolved
By default, presto will GZIP query result JSON payloads sent to the client. However, especially when the client is connected to the coordinator over localhost, the added overhead of compressing the response and then uncompressing it on the client is a losing proposition. For queries that are bound only by result processing throughput (eg: SELECT * FROM <large table>) execution time can reduced by 20-50% when submitted over a localhost connection with compression disabled.
Allows configuring HTTP response compression for the query results endpoints at the server level, regardless of client configuration.
a6f5f5b to
33afb0c
Compare
shixuan-fan
left a comment
There was a problem hiding this comment.
Sorry took the last week off so coming back late. I'm wondering if it actually makes sense if we use an enum for "compression algorithm" (so we could use identity to disable compression), and use gzip as default value? This way we could avoid the enabled/disabled dilemma, and could potentially enable more compression algorithms if available.
I think the problem with that is, as currently implemented, we have no control over the compression algorithm (or indeed, compression level) used by the jetty middleware without some much more significant work. It’s probably possible but it’s much more involved and presumes that different algorithms would actually provide a meaningful middle ground between no compression and gzip (which I doubt, but can’t say for sure without setting up the experiment). If you wanted to add that option in the future it would have to accommodate encoding type negotiation between the client and server which also adds complexity. |
mbasmanova
left a comment
There was a problem hiding this comment.
@pettyjamesm James, this is a nice change, but I don't see this functionality documented in https://prestodb.io/docs/current/develop/client-protocol.html Any chance you could help update the documentation?
I'm happy to update the docs, but I'm not 100% sure where a high level description of the change should go. Fundamentally, the changes here don't depend on any presto-specific headers, it's just leveraging standard HTTP semantics for negotiating the encoding of the response built into the client and server to control whether responses will be gzipped when sent or not so it's not so much a "client protocol" concern as it is maybe a client configuration flag property instead? Would you suggest just adding a documentation entry to the JDBC properties doc? |
Before this change, query result JSON responses were generally compressed (assuming the response met the minimum size threshold and passed the user agent checks), so that behavior is still the default. However, disabling GZIP compression can significantly improve throughput of sending query results, especially over localhost links where the overhead of compressing the response and then uncompressing it again on the client side is never worth the bandwidth savings.
Clients are allowed to opt-out of compression, but not request compression from a server which has decided to disable compressed query result responses. Both sides ultimately negotiate the result based on their
Accept-EncodingorContent-Encodingheaders and the way that the gzip compression middleware interprets them.For queries that are bound only by result processing throughput (eg:
SELECT * FROM <large table>) execution time can reduced by 20-50% when submitted over a localhost connection with compression disabled.