Skip to content

http: add modifyBuffer filter callback#5899

Merged
snowp merged 18 commits intoenvoyproxy:masterfrom
snowp:modify-buffer2
Feb 20, 2019
Merged

http: add modifyBuffer filter callback#5899
snowp merged 18 commits intoenvoyproxy:masterfrom
snowp:modify-buffer2

Conversation

@snowp
Copy link
Copy Markdown
Contributor

@snowp snowp commented Feb 10, 2019

Adds a filter callback that allows modifying the encoding/decoding buffer. This is useful in allowing
a filter to modify the buffer after seeing the entire buffer while still using the watermarked buffer
maintained by the HCM.

We only allow modifying the buffer from the latest filter that seen the request/response data. This
ensures that we don't have multiple filters both making changes to the buffer at the same time.

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium due to changes to HCM
Testing: Integration tests
Docs Changes: n/a
Release Notes: Added release note
Fixes #5394

Snow Pettersen added 7 commits February 10, 2019 12:21
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp snowp requested a review from mattklein123 February 10, 2019 18:21
* Allows modifying the decoding buffer. May only be called before any data has been continued
* past the calling filter.
*/
virtual void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) PURE;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used a callback style here to make it clear to the caller that they're not supposed to retain a pointer to the buffer

Snow Pettersen added 3 commits February 10, 2019 10:45
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@mattklein123 mattklein123 self-assigned this Feb 11, 2019
* mysql: added a MySQL proxy filter that is capable of parsing SQL queries over MySQL wire protocol. Refer to ::ref:`MySQL proxy<config_network_filters_mysql_proxy>` for more details.
* http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged.
* http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data.
* redis: added :ref:`success and error stats <config_network_filters_redis_proxy_per_command_stats>` for commands.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is duplicate

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this turned out surprisingly simple and nice! Some small nits/comments. @alyssawilk @soya3129 any thoughts here given our recent discussions?

/wait

}
const Buffer::Instance* decodingBuffer() override { return buffered_body_.get(); }
void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) override {
callback(*buffered_body_.get());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just NOT_IMPLEMENTED this and below? I don't think the router filter should ever call this currently?

// onData callback. To do so, we compare the current latest with the *previous* filter. If they
// match, then we must be processing a new filter for the first time. We omit this check if we're
// the first filter, since the above check handles that case.
if (current_filter != filters.begin() && *latest_filter == std::prev(current_filter)->get()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just universally set latest to current? Seems simple to read/reason about?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to cover the case where there are multiple onData callbacks: If we just set latest to current, then the first onData filter iteration would correctly iterate over the the filters and set latest, but on subsequent onData iterations we'd start from the beginning again, potentially allowing filter N to modify the buffer even though filter M > N was the filter that inserted data into the buffer.

Hopefully this makes sense - open for suggestions if this seems unnecessary or if there's a better way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see OK, makes sense. Can you clarify that a bit in the above comment (just add your additional explanation in the above comment)?


// Shared helper for recording the latest filter used.
template <class T>
void recordLatestDataFilter(const typename FilterList<T>::iterator current_filter,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @soya3129 this is similar to what you had been doing in one of your metadata PRs in case you end up needing this again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks! Can be very useful when we allow metadata to go through downstream filters only.

// Shared helper for recording the latest filter used.
template <class T>
void recordLatestDataFilter(const typename FilterList<T>::iterator current_filter,
T** latest_filter, const FilterList<T>& filters) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: small preference for passing a reference to a pointer, since it can't be nullptr, and less dereferencing below, but up to you.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small typo

/wait

//
// We compare against the previous filter to avoid multiple filter iterations from reseting the
// pointer: If we just set latest to current, then the first onData filter iteration would
// correctly iterate over the the filters and set latest, but on subsequent onData iterations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "the the"

Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
mattklein123 previously approved these changes Feb 11, 2019
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again sorry for chiming in late. One request for test fixes and one question on the API plan

* Allows modifying the decoding buffer. May only be called before any data has been continued
* past the calling filter.
*/
virtual void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) PURE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, chiming in late.

If we're going to need a custom filter in order to buffer the whole body and then call modifyDecodingBuffer, would it be possible to refactor the buffering filter to have a stream complete callback and folks can subclass and implement the onBufferingFilterStreamComplete callback? Or better yet we could implement #5834, subclass the buffering filter and override the base class onEncode/DecodeComplete.

My concern is both adding extra complexity to the HCM for something I think we can push into an exiting filter, and I think by design it's subject to the data-with-end-stream problem called out below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. @snowp if this makes sense to you I would be in favor of closing this and doing what @alyssawilk says?

public:
Http::FilterDataStatus decodeData(Buffer::Instance&, bool end_stream) {
if (end_stream) {
decoder_callbacks_->modifyDecodingBuffer([](auto& buffer) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you call out in the integration test, this doesn't modify the entire request body when the data arrives with end stream. I suspect this generally won't be what the user wants - we've seen plenty of these bugs (e.g. #5674) where we need to
callbacks_->addDecodedData(data, true);
to capture data sent inline with end_stream

This means that the reference implementation of this function doesn't do what it says (and folks may copy it without realizing) and also that the design is subject to the data-with-final-end-stream pattern which I'd really love to avoid. Can we try to fix this?


void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) override {
ASSERT(parent_.state_.latest_data_decoding_filter_ == this);
callback(*parent_.buffered_request_data_.get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for chiming in late, but this seems like a lot of complexity for an ASSERT. To me, this has similar functionality to just allowing raw buffer access - the filter can do arbitrary transforms on any data, and it'd be far simpler conceptually to just allow connections to access the buffer directly than add on the std::function complexity to get an ASSERT check.
I suspect Matt may disagree as he hasn't called it out so I'm fine letting it stand :-)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we stick with this approach, I do personally think this assert is worth it, as well as the differentiation between const and non-const access, mainly because I think it would be very easy to get hard to understand behavior between filters. With that said, per your other comment, maybe we don't need this change at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need something, as the current buffering filter pushes the buffering into the HCM. We could refactor the buffering filter to do the buffering itself (which seems reasonable) and then subclass, but the extra work was why I am fine going either way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, sorry, I wasn't thinking very clearly at the end of the day yesterday. As you point out the buffer filter as-is won't do it. I'm not in favor of changing how the buffer filter works mainly because we avoid double buffering in many cases. I.e., the buffer filter buffers, then some other filter buffers but it's a NOP because the HCM has already buffered the data.

I guess in thinking about it more, I'm back to being fine with this solution. Alyssa's concern about not handling end_stream is a good one, though I can't think of any elegant quick fix for that if we keep this general API flow. Any ideas?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, I'd edited my comment in some window or other and it got eaten by GitHub.

I don't know if we can fix the end_stream thing here, but we can at least update our sample code to do the addDecodedData dance, and then do away with it if #5834 gets fixed. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can fix the end_stream thing here, but we can at least update our sample code to do the addDecodedData dance, and then do away with it if #5834 gets fixed. WDYT?

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all sounds good to me. I'll update the tests

Snow Pettersen added 2 commits February 13, 2019 15:42
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

/wait

public:
Http::FilterDataStatus decodeData(Buffer::Instance&, bool end_stream) {
Http::FilterDataStatus decodeData(Buffer::Instance& data, bool end_stream) {
decoder_callbacks_->addDecodedData(data, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this addition, I think you want to return no buffer in the response code. Before @alyssawilk yells at me about this, I agree this is too complicated, and I will continue to think about how to make this better generically. :)

Same for the encode case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense, updated

Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
mattklein123 previously approved these changes Feb 13, 2019
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM will defer to @alyssawilk for final approval.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM and I think the rest are all personal style preferences so Matt, you're good to merge.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Feb 20, 2019

Had to merge master so PTAL @alyssawilk @mattklein123

@snowp snowp merged commit 094eb85 into envoyproxy:master Feb 20, 2019
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Adds a filter callback that allows modifying the encoding/decoding buffer. This is useful in allowing
a filter to modify the buffer after seeing the entire buffer while still using the watermarked buffer
maintained by the HCM.

We only allow modifying the buffer from the latest filter that seen the request/response data. This
ensures that we don't have multiple filters both making changes to the buffer at the same time.

Signed-off-by: Snow Pettersen snowp@squareup.com
Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants