Skip to content

Make query result HTTP response compression configurable#15393

Merged
shixuan-fan merged 2 commits intoprestodb:masterfrom
pettyjamesm:query-results-gzip-config
Nov 30, 2020
Merged

Make query result HTTP response compression configurable#15393
shixuan-fan merged 2 commits intoprestodb:masterfrom
pettyjamesm:query-results-gzip-config

Conversation

@pettyjamesm
Copy link
Contributor

@pettyjamesm pettyjamesm commented Nov 3, 2020

Before this change, query result JSON responses were generally compressed (assuming the response met the minimum size threshold and passed the user agent checks), so that behavior is still the default. However, disabling GZIP compression can significantly improve throughput of sending query results, especially over localhost links where the overhead of compressing the response and then uncompressing it again on the client side is never worth the bandwidth savings.

Clients are allowed to opt-out of compression, but not request compression from a server which has decided to disable compressed query result responses. Both sides ultimately negotiate the result based on their Accept-Encoding or Content-Encoding headers and the way that the gzip compression middleware interprets them.

For queries that are bound only by result processing throughput (eg: SELECT * FROM <large table>) execution time can reduced by 20-50% when submitted over a localhost connection with compression disabled.

== RELEASE NOTES ==

General Changes
* Add support for disabling query result compression via client and server-side configuration properties. Clients
can disable compressed responses using the ``--disable-compression`` flag or ``disableCompression`` driver property. Compression can be disabled server-wide by using the configuration property: ``query-results.compression-enabled=false``

@pettyjamesm pettyjamesm marked this pull request as draft November 3, 2020 22:14
@pettyjamesm pettyjamesm force-pushed the query-results-gzip-config branch from 919e547 to eee98d4 Compare November 4, 2020 16:04
@pettyjamesm pettyjamesm force-pushed the query-results-gzip-config branch from eee98d4 to 8731ada Compare November 16, 2020 15:01
@pettyjamesm pettyjamesm marked this pull request as ready for review November 16, 2020 15:02
@pettyjamesm pettyjamesm requested a review from kokosing November 16, 2020 15:02
@pettyjamesm pettyjamesm force-pushed the query-results-gzip-config branch from 8731ada to 46d990c Compare November 16, 2020 16:06
@pettyjamesm pettyjamesm force-pushed the query-results-gzip-config branch from 46d990c to a6f5f5b Compare November 19, 2020 19:23
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead use compressionEnabled? Disabled would incur one more flip in mind :)

Copy link
Contributor Author

@pettyjamesm pettyjamesm Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing is a little weird, but the mechanism here doesn't actually let you force compression- only disable compression. Same thing with the server side logic. Default negotiation happens inside of jetty around the client Accept-Encoding header and server's chosen mime-type and Content-Encoding headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But we could probably just flip the boolean when it is used? I'm just worried this might be error-prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I’m worried about the impression that “enabled” will give. To me, that sounds like the client has ultimate control of the decision when in fact, they do not. The client only as the option to opt-out of compression.

By default, presto will GZIP query result JSON payloads sent to the
client. However, especially when the client is connected to the
coordinator over localhost, the added overhead of compressing the
response and then uncompressing it on the client is a losing
proposition.

For queries that are bound only by result processing throughput (eg:
SELECT * FROM <large table>) execution time can reduced by 20-50%
when submitted over a localhost connection with compression disabled.
Allows configuring HTTP response compression for the query results
endpoints at the server level, regardless of client configuration.
@pettyjamesm pettyjamesm force-pushed the query-results-gzip-config branch from a6f5f5b to 33afb0c Compare November 23, 2020 14:14
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry took the last week off so coming back late. I'm wondering if it actually makes sense if we use an enum for "compression algorithm" (so we could use identity to disable compression), and use gzip as default value? This way we could avoid the enabled/disabled dilemma, and could potentially enable more compression algorithms if available.

@pettyjamesm
Copy link
Contributor Author

Sorry took the last week off so coming back late. I'm wondering if it actually makes sense if we use an enum for "compression algorithm" (so we could use identity to disable compression), and use gzip as default value? This way we could avoid the enabled/disabled dilemma, and could potentially enable more compression algorithms if available.

I think the problem with that is, as currently implemented, we have no control over the compression algorithm (or indeed, compression level) used by the jetty middleware without some much more significant work. It’s probably possible but it’s much more involved and presumes that different algorithms would actually provide a meaningful middle ground between no compression and gzip (which I doubt, but can’t say for sure without setting up the experiment). If you wanted to add that option in the future it would have to accommodate encoding type negotiation between the client and server which also adds complexity.

@shixuan-fan shixuan-fan merged commit dc2d50b into prestodb:master Nov 30, 2020
@caithagoras caithagoras mentioned this pull request Dec 4, 2020
1 task
@pettyjamesm pettyjamesm deleted the query-results-gzip-config branch December 4, 2020 20:52
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettyjamesm James, this is a nice change, but I don't see this functionality documented in https://prestodb.io/docs/current/develop/client-protocol.html Any chance you could help update the documentation?

@pettyjamesm
Copy link
Contributor Author

@pettyjamesm James, this is a nice change, but I don't see this functionality documented in https://prestodb.io/docs/current/develop/client-protocol.html Any chance you could help update the documentation?

I'm happy to update the docs, but I'm not 100% sure where a high level description of the change should go. Fundamentally, the changes here don't depend on any presto-specific headers, it's just leveraging standard HTTP semantics for negotiating the encoding of the response built into the client and server to control whether responses will be gzipped when sent or not so it's not so much a "client protocol" concern as it is maybe a client configuration flag property instead? Would you suggest just adding a documentation entry to the JDBC properties doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants