compression: add brotli compressor and decompressor#12998
compression: add brotli compressor and decompressor#12998lizan merged 55 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
|
/retest |
|
Retrying Azure Pipelines, to retry CircleCI checks, use |
| Envoy::Compression::Compressor::State state) { | ||
| BrotliContext ctx(chunk_size_); | ||
|
|
||
| for (const Buffer::RawSlice& input_slice : buffer.getRawSlices()) { |
There was a problem hiding this comment.
i'm a little confused about the buffer management here. It looks like we are appending the compress bytes to the buffer in process() while we are looping over buffer.getRawSlices().
Could we use a separate output buffer -- for clarity at least -- and then swap them at the end if want compress() to be a mutation on buffer?
I think what you have is basically correct but it feels a little queezy to be modifying the buffer while iterating over it.
There was a problem hiding this comment.
I stole this approach from the gzip encoder since it seems to consume a bit less memory. I guess gzip needs to be updated too.
There was a problem hiding this comment.
I wonder if it would be better to drain the buffer after every slice, so that the output buffer could potentially re-use the memory from the drained slices from the input buffer.
There was a problem hiding this comment.
Thanks! Updated to drain the buffer as soon as possible.
| if (ctx.avail_out == 0) { | ||
| // update output and reset context | ||
| output_buffer.add(static_cast<void*>(ctx.chunk_ptr.get()), chunk_size_); | ||
| ctx.chunk_ptr = std::make_unique<uint8_t[]>(chunk_size_); |
There was a problem hiding this comment.
why do we need to reallocate chunk_ptr on every call to process()? It looks like we are copying the bytes out of it above, can't we just re-use the same buffer and just define
void reset() { next_out = chunk_ptr.get(); }
There was a problem hiding this comment.
Oops, that was a debug leftover copy-pasted in two places. Thank you!
| void decompress(const Buffer::Instance& input_buffer, Buffer::Instance& output_buffer) override; | ||
|
|
||
| private: | ||
| struct BrotliContext { |
There was a problem hiding this comment.
this is almost identical to the compressor's BrotliContext; the only difference is whether chunk_size is taken as a size_t or uint32_t. Can we share the struct definition?
and anyway let's probably make it a class and factor out the common management of the member variables (e.g. reset() per above comment)
There was a problem hiding this comment.
For now I've moved the struct to a common module. But probably it makes sense to turn it into a base class for the implementations in the similar fashion it's done for gzip. WDYT?
source/extensions/compression/brotli/compressor/brotli_compressor_impl.h
Outdated
Show resolved
Hide resolved
source/extensions/compression/brotli/compressor/brotli_compressor_impl.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
|
Thanks for the initial pass, will take a look today |
junr03
left a comment
There was a problem hiding this comment.
initial pass. Super excited for this!
api/envoy/extensions/compression/brotli/compressor/v3/brotli.proto
Outdated
Show resolved
Hide resolved
api/envoy/extensions/compression/brotli/compressor/v3/brotli.proto
Outdated
Show resolved
Hide resolved
| // Ring buffer is allocated according to window size, despite the real size of the content. | ||
| bool disable_ring_buffer_reallocation = 1; | ||
|
|
||
| // Value for encoder's next output buffer. If not set, defaults to 4096. |
There was a problem hiding this comment.
| // Value for encoder's next output buffer. If not set, defaults to 4096. | |
| // Value for decoder's next output buffer. If not set, defaults to 4096. |
Although should we just use compressor/decompressor in these comments?
There was a problem hiding this comment.
Agree, replaced with compressor/decompressor.
source/extensions/compression/brotli/compressor/brotli_compressor_impl.h
Outdated
Show resolved
Hide resolved
| process(ctx, buffer, | ||
| state == Envoy::Compression::Compressor::State::Finish ? BROTLI_OPERATION_FINISH | ||
| : BROTLI_OPERATION_FLUSH); | ||
| } while (BrotliEncoderHasMoreOutput(state_.get()) && !BrotliEncoderIsFinished(state_.get())); |
There was a problem hiding this comment.
What does it mean if the encoder gets to a point where it still has more output but is finished? Or conversely if there is no more output but it hasn't finished? Is there something to be done at this layer?
Seems like the expected final state is for there to be no more output and for the encoder to have finished? Do we only expect reaching that at the same time that this function gets state == Envoy::Compression::Compressor::State::Finish?
There was a problem hiding this comment.
I looked at Brotli's source code and it turned out BrotliEncoderIsFinished() internally just calls BrotliEncoderHasMoreOutput() and checks if the current operation is FINISH. So it's not really needed here and we should care only about the encoder's empty output.
There was a problem hiding this comment.
inline comment here would also be good
| // Default compression window size. | ||
| const uint32_t DefaultWindowBits = 22; | ||
|
|
||
| // Default quality. The actual value passed to the compressor is decremented by |
There was a problem hiding this comment.
Added a clarification. But perhaps there is a better way to deal with zero values in omitted fields. I can only come up with making the field mandatory.
There was a problem hiding this comment.
Ah, yeah you can use google.protobuf.UInt32Value so there is a difference between omitted and default.
There was a problem hiding this comment.
Oh, I didn't know that. Thanks! I dropped the decrement.
| const BrotliEncoderOperation op) { | ||
| BROTLI_BOOL result = BrotliEncoderCompressStream(state_.get(), op, &ctx.avail_in_, &ctx.next_in_, | ||
| &ctx.avail_out_, &ctx.next_out_, nullptr); | ||
| RELEASE_ASSERT(result == BROTLI_TRUE, "unable to compress"); |
There was a problem hiding this comment.
How come the compressor asserts while the decompressor increases an error stat?
There was a problem hiding this comment.
For the compressor it must be an internal error or OOM. But in case of decompression it usually means the input data is garbage.
There was a problem hiding this comment.
Ah I see, so for the compressor you want to crash because there is nothing the user can do. But for the decompressor it could be user error (garbage input) so you want to alert via telemetry.
There was a problem hiding this comment.
Yes, exactly. The gzip extensions has the same behavior.
| ctx.next_in_ = static_cast<uint8_t*>(input_slice.mem_); | ||
|
|
||
| while (ctx.avail_in_ > 0) { | ||
| if (!process(ctx, output_buffer)) { |
There was a problem hiding this comment.
Maybe I should just read the brotli documentation. But I wonder if we should write some documentation inline about this.
In compression I could, at least at the surface see that we had different operations, and there was first a process pass, and then a flush/finish pass. Decompression doesn't seem to have an explicit operation? Why do we still need two passes?
There was a problem hiding this comment.
The first loop runs until the input is not empty. But there might be situations when small input fully consumed by the decompressor unfolds into output not fitting the output chunk. Hence the second loop runs until there's no more output in the decompressor.
Would such a comment help?
There was a problem hiding this comment.
Yeah, inline comment would help. That way future readers can understand the interaction without needing to go to brotli source from the get go.
source/extensions/compression/brotli/decompressor/brotli_decompressor_impl.h
Outdated
Show resolved
Hide resolved
|
/wait |
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
|
Merged the latest master to resolve a merge conflict in |
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
|
/retest |
|
Retrying Azure Pipelines: |
|
@lizan could you take a look? |
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
htuch
left a comment
There was a problem hiding this comment.
/lgtm deps
Thanks so much for the detailed scorecard!
|
/retest |
|
Retrying Azure Pipelines: |
|
I'm getting build errors as there isn't a |
diff --git a/source/extensions/compression/brotli/compressor/BUILD b/source/extensions/compression/brotli/compressor/BUILD
index efd1739e0..93f13d8fe 100644
--- a/source/extensions/compression/brotli/compressor/BUILD
+++ b/source/extensions/compression/brotli/compressor/BUILD
@@ -25,6 +25,7 @@ envoy_cc_extension(
name = "config",
srcs = ["config.cc"],
hdrs = ["config.h"],
+ category = "envoy.compression.decompressor",
security_posture = "robust_to_untrusted_downstream",
deps = [
":compressor_lib",
diff --git a/source/extensions/compression/brotli/decompressor/BUILD b/source/extensions/compression/brotli/decompressor/BUILD
index 3cc016e2f..3667300a8 100644
--- a/source/extensions/compression/brotli/decompressor/BUILD
+++ b/source/extensions/compression/brotli/decompressor/BUILD
@@ -27,6 +27,7 @@ envoy_cc_extension(
name = "config",
srcs = ["config.cc"],
hdrs = ["config.h"],
+ category = "envoy.compression.decompressor",
security_posture = "robust_to_untrusted_downstream",
deps = [
":decompressor_lib", |
Commit Message: compression: add brotli compressor and decompressor
Additional Description: Add new brotli compression extensions in addition to gzip.
Risk Level: Low, no existing functionality is touched
Testing: uni tests, manual tests with curl.
Docs Changes: updated docs for compression and decompression HTTP filters to refer the new available encoder/decoder.
Release Notes: updated current.rst
Fixes #4429
The PR adds a new dependency on https://github.com/google/brotli. Here's the current criteria answers:
-c optis 752K