Skip to content

compression: add brotli compressor and decompressor#12998

Merged
lizan merged 55 commits intoenvoyproxy:mainfrom
rojkov:brotli-v2
Feb 8, 2021
Merged

compression: add brotli compressor and decompressor#12998
lizan merged 55 commits intoenvoyproxy:mainfrom
rojkov:brotli-v2

Conversation

@rojkov
Copy link
Member

@rojkov rojkov commented Sep 7, 2020

Commit Message: compression: add brotli compressor and decompressor
Additional Description: Add new brotli compression extensions in addition to gzip.
Risk Level: Low, no existing functionality is touched
Testing: uni tests, manual tests with curl.
Docs Changes: updated docs for compression and decompression HTTP filters to refer the new available encoder/decoder.
Release Notes: updated current.rst
Fixes #4429

The PR adds a new dependency on https://github.com/google/brotli. Here's the current criteria answers:

Criteria Answer
Cloud Native Computing Foundation (CNCF) approved license MIT
Dependencies must not substantially increase the binary size unless they are optional brotli's binary size built with -c opt is 752K
No duplication of existing dependencies no other dep provides Brotli
Hosted on a git repository and the archive fetch must directly reference this repository. https://github.com/google/brotli
CVE history appears reasonable, no pathological CVE arcs so far 4 CVEs related to brotli have been registered
Code review (ideally PRs) before merge PRs are reviewed before merge
Security vulnerability process exists, with contact details and reporting/disclosure process no policy exists, submitted google/brotli#878
> 1 contributor responsible for a non-trivial number of commits 75 contributors
Tests run in CI CI set up with AppVeyor and Github actions
High test coverage (also static/dynamic analysis, fuzzing) Fuzzers are run in CI
Envoy can obtain advanced notification of vulnerabilities or of security releases brotli is registered in CPE
Do other significant projects have shared fate by using this dependency? Google Chrome is using the library
Releases (with release notes) https://github.com/google/brotli/releases
Commits/releases in last 90 days last commit 9 days ago

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.
CC @envoyproxy/dependency-watchers: FYI only for changes made to (bazel/repository_locations\.bzl)|(api/bazel/repository_locations\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #12998 was opened by rojkov.

see: more, trace.

Dmitry Rozhkov added 4 commits September 7, 2020 16:09
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
@rojkov rojkov marked this pull request as draft September 8, 2020 13:00
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
@htuch htuch requested a review from jmarantz September 8, 2020 13:15
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@rojkov rojkov marked this pull request as ready for review September 8, 2020 15:13
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@htuch htuch requested a review from junr03 September 8, 2020 18:30
@htuch htuch assigned junr03 and unassigned jmarantz Sep 8, 2020
@rojkov
Copy link
Member Author

rojkov commented Sep 9, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12998 (comment) was created by @rojkov.

see: more, trace.

@jmarantz jmarantz self-assigned this Sep 9, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing comments

Envoy::Compression::Compressor::State state) {
BrotliContext ctx(chunk_size_);

for (const Buffer::RawSlice& input_slice : buffer.getRawSlices()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little confused about the buffer management here. It looks like we are appending the compress bytes to the buffer in process() while we are looping over buffer.getRawSlices().

Could we use a separate output buffer -- for clarity at least -- and then swap them at the end if want compress() to be a mutation on buffer?

I think what you have is basically correct but it feels a little queezy to be modifying the buffer while iterating over it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stole this approach from the gzip encoder since it seems to consume a bit less memory. I guess gzip needs to be updated too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to drain the buffer after every slice, so that the output buffer could potentially re-use the memory from the drained slices from the input buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated to drain the buffer as soon as possible.

if (ctx.avail_out == 0) {
// update output and reset context
output_buffer.add(static_cast<void*>(ctx.chunk_ptr.get()), chunk_size_);
ctx.chunk_ptr = std::make_unique<uint8_t[]>(chunk_size_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to reallocate chunk_ptr on every call to process()? It looks like we are copying the bytes out of it above, can't we just re-use the same buffer and just define

void reset() { next_out = chunk_ptr.get(); }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that was a debug leftover copy-pasted in two places. Thank you!

void decompress(const Buffer::Instance& input_buffer, Buffer::Instance& output_buffer) override;

private:
struct BrotliContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is almost identical to the compressor's BrotliContext; the only difference is whether chunk_size is taken as a size_t or uint32_t. Can we share the struct definition?

and anyway let's probably make it a class and factor out the common management of the member variables (e.g. reset() per above comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I've moved the struct to a common module. But probably it makes sense to turn it into a base class for the implementations in the similar fashion it's done for gzip. WDYT?

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@junr03
Copy link
Member

junr03 commented Sep 11, 2020

Thanks for the initial pass, will take a look today

Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial pass. Super excited for this!

// Ring buffer is allocated according to window size, despite the real size of the content.
bool disable_ring_buffer_reallocation = 1;

// Value for encoder's next output buffer. If not set, defaults to 4096.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Value for encoder's next output buffer. If not set, defaults to 4096.
// Value for decoder's next output buffer. If not set, defaults to 4096.

Although should we just use compressor/decompressor in these comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, replaced with compressor/decompressor.

process(ctx, buffer,
state == Envoy::Compression::Compressor::State::Finish ? BROTLI_OPERATION_FINISH
: BROTLI_OPERATION_FLUSH);
} while (BrotliEncoderHasMoreOutput(state_.get()) && !BrotliEncoderIsFinished(state_.get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean if the encoder gets to a point where it still has more output but is finished? Or conversely if there is no more output but it hasn't finished? Is there something to be done at this layer?

Seems like the expected final state is for there to be no more output and for the encoder to have finished? Do we only expect reaching that at the same time that this function gets state == Envoy::Compression::Compressor::State::Finish?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at Brotli's source code and it turned out BrotliEncoderIsFinished() internally just calls BrotliEncoderHasMoreOutput() and checks if the current operation is FINISH. So it's not really needed here and we should care only about the encoder's empty output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline comment here would also be good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Default compression window size.
const uint32_t DefaultWindowBits = 22;

// Default quality. The actual value passed to the compressor is decremented by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clarification. But perhaps there is a better way to deal with zero values in omitted fields. I can only come up with making the field mandatory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah you can use google.protobuf.UInt32Value so there is a difference between omitted and default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know that. Thanks! I dropped the decrement.

const BrotliEncoderOperation op) {
BROTLI_BOOL result = BrotliEncoderCompressStream(state_.get(), op, &ctx.avail_in_, &ctx.next_in_,
&ctx.avail_out_, &ctx.next_out_, nullptr);
RELEASE_ASSERT(result == BROTLI_TRUE, "unable to compress");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the compressor asserts while the decompressor increases an error stat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the compressor it must be an internal error or OOM. But in case of decompression it usually means the input data is garbage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, so for the compressor you want to crash because there is nothing the user can do. But for the decompressor it could be user error (garbage input) so you want to alert via telemetry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The gzip extensions has the same behavior.

ctx.next_in_ = static_cast<uint8_t*>(input_slice.mem_);

while (ctx.avail_in_ > 0) {
if (!process(ctx, output_buffer)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should just read the brotli documentation. But I wonder if we should write some documentation inline about this.

In compression I could, at least at the surface see that we had different operations, and there was first a process pass, and then a flush/finish pass. Decompression doesn't seem to have an explicit operation? Why do we still need two passes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first loop runs until the input is not empty. But there might be situations when small input fully consumed by the decompressor unfolds into output not fitting the output chunk. Hence the second loop runs until there's no more output in the decompressor.

Would such a comment help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, inline comment would help. That way future readers can understand the interaction without needing to go to brotli source from the get go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

@junr03
Copy link
Member

junr03 commented Sep 13, 2020

/wait

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Dec 30, 2020
@rojkov
Copy link
Member Author

rojkov commented Dec 30, 2020

Merged the latest master to resolve a merge conflict in current.rst.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Base automatically changed from master to main January 15, 2021 23:01
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@rojkov
Copy link
Member Author

rojkov commented Jan 21, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12998 (comment) was created by @rojkov.

see: more, trace.

@rojkov
Copy link
Member Author

rojkov commented Jan 21, 2021

@lizan could you take a look?

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@rojkov
Copy link
Member Author

rojkov commented Feb 2, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12998 (comment) was created by @rojkov.

see: more, trace.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm deps

Thanks so much for the detailed scorecard!

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Feb 5, 2021
@rojkov
Copy link
Member Author

rojkov commented Feb 5, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12998 (comment) was created by @rojkov.

see: more, trace.

@lizan lizan merged commit 127aa55 into envoyproxy:main Feb 8, 2021
@moderation
Copy link
Contributor

moderation commented Feb 8, 2021

I'm getting build errors as there isn't a category specified in https://github.com/envoyproxy/envoy/blob/main/source/extensions/compression/brotli/compressor/BUILD#L24-L35

ERROR: Traceback (most recent call last):
        File "~/Library/envoyproxy/envoy/source/extensions/compression/brotli/compressor/BUILD", line 24, column 19, in <toplevel>
                envoy_cc_extension(
        File "~/Library/envoyproxy/envoy/bazel/envoy_library.bzl", line 124, column 13, in envoy_cc_extension
                fail("Category not set for %s" % name)
Error in fail: Category not set for config
ERROR: Traceback (most recent call last):
        File "~/Library/envoyproxy/envoy/source/extensions/compression/brotli/decompressor/BUILD", line 26, column 19, in <toplevel>
                envoy_cc_extension(
        File "~/Library/envoyproxy/envoy/bazel/envoy_library.bzl", line 124, column 13, in envoy_cc_extension
                fail("Category not set for %s" % name)
Error in fail: Category not set for config
ERROR: ~/Library/envoyproxy/envoy/source/exe/BUILD:28:17: no such target '//source/extensions/compression/brotli/compressor:config_envoy_extension': target 'config_envoy_extension' not declared in package 'source/extensions/compression/brotli/compressor' defined by ~/Library/envoyproxy/envoy/source/extensions/compression/brotli/compressor/BUILD and referenced by '//source/exe:envoy_common_lib'
ERROR: ~/Library/envoyproxy/envoy/source/exe/BUILD:28:17: no such target '//source/extensions/compression/brotli/decompressor:config_envoy_extension': target 'config_envoy_extension' not declared in package 'source/extensions/compression/brotli/decompressor' defined by ~/Library/envoyproxy/envoy/source/extensions/compression/brotli/decompressor/BUILD and referenced by '//source/exe:envoy_common_lib'
ERROR: Analysis of target '//source/exe:envoy-static.stripped' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.842s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)

@rojkov @lizan

@moderation
Copy link
Contributor

diff --git a/source/extensions/compression/brotli/compressor/BUILD b/source/extensions/compression/brotli/compressor/BUILD
index efd1739e0..93f13d8fe 100644
--- a/source/extensions/compression/brotli/compressor/BUILD
+++ b/source/extensions/compression/brotli/compressor/BUILD
@@ -25,6 +25,7 @@ envoy_cc_extension(
     name = "config",
     srcs = ["config.cc"],
     hdrs = ["config.h"],
+    category = "envoy.compression.decompressor",
     security_posture = "robust_to_untrusted_downstream",
     deps = [
         ":compressor_lib",
diff --git a/source/extensions/compression/brotli/decompressor/BUILD b/source/extensions/compression/brotli/decompressor/BUILD
index 3cc016e2f..3667300a8 100644
--- a/source/extensions/compression/brotli/decompressor/BUILD
+++ b/source/extensions/compression/brotli/decompressor/BUILD
@@ -27,6 +27,7 @@ envoy_cc_extension(
     name = "config",
     srcs = ["config.cc"],
     hdrs = ["config.h"],
+    category = "envoy.compression.decompressor",
     security_posture = "robust_to_untrusted_downstream",
     deps = [
         ":decompressor_lib",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Brotli Compression support in envoy

7 participants