http: add modifyBuffer filter callback#5899
Conversation
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| * Allows modifying the decoding buffer. May only be called before any data has been continued | ||
| * past the calling filter. | ||
| */ | ||
| virtual void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) PURE; |
There was a problem hiding this comment.
Used a callback style here to make it clear to the caller that they're not supposed to retain a pointer to the buffer
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
docs/root/intro/version_history.rst
Outdated
| * mysql: added a MySQL proxy filter that is capable of parsing SQL queries over MySQL wire protocol. Refer to ::ref:`MySQL proxy<config_network_filters_mysql_proxy>` for more details. | ||
| * http: added :ref:`max request headers size <envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.max_request_headers_kb>`. The default behaviour is unchanged. | ||
| * http: added modifyDecodingBuffer/modifyEncodingBuffer to allow modifying the buffered request/response data. | ||
| * redis: added :ref:`success and error stats <config_network_filters_redis_proxy_per_command_stats>` for commands. |
There was a problem hiding this comment.
nit: this is duplicate
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this turned out surprisingly simple and nice! Some small nits/comments. @alyssawilk @soya3129 any thoughts here given our recent discussions?
/wait
| } | ||
| const Buffer::Instance* decodingBuffer() override { return buffered_body_.get(); } | ||
| void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) override { | ||
| callback(*buffered_body_.get()); |
There was a problem hiding this comment.
maybe just NOT_IMPLEMENTED this and below? I don't think the router filter should ever call this currently?
| // onData callback. To do so, we compare the current latest with the *previous* filter. If they | ||
| // match, then we must be processing a new filter for the first time. We omit this check if we're | ||
| // the first filter, since the above check handles that case. | ||
| if (current_filter != filters.begin() && *latest_filter == std::prev(current_filter)->get()) { |
There was a problem hiding this comment.
Why not just universally set latest to current? Seems simple to read/reason about?
There was a problem hiding this comment.
I wanted to cover the case where there are multiple onData callbacks: If we just set latest to current, then the first onData filter iteration would correctly iterate over the the filters and set latest, but on subsequent onData iterations we'd start from the beginning again, potentially allowing filter N to modify the buffer even though filter M > N was the filter that inserted data into the buffer.
Hopefully this makes sense - open for suggestions if this seems unnecessary or if there's a better way.
There was a problem hiding this comment.
I see OK, makes sense. Can you clarify that a bit in the above comment (just add your additional explanation in the above comment)?
|
|
||
| // Shared helper for recording the latest filter used. | ||
| template <class T> | ||
| void recordLatestDataFilter(const typename FilterList<T>::iterator current_filter, |
There was a problem hiding this comment.
cc @soya3129 this is similar to what you had been doing in one of your metadata PRs in case you end up needing this again.
There was a problem hiding this comment.
Very nice! Thanks! Can be very useful when we allow metadata to go through downstream filters only.
| // Shared helper for recording the latest filter used. | ||
| template <class T> | ||
| void recordLatestDataFilter(const typename FilterList<T>::iterator current_filter, | ||
| T** latest_filter, const FilterList<T>& filters) { |
There was a problem hiding this comment.
nit: small preference for passing a reference to a pointer, since it can't be nullptr, and less dereferencing below, but up to you.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM with small typo
/wait
| // | ||
| // We compare against the previous filter to avoid multiple filter iterations from reseting the | ||
| // pointer: If we just set latest to current, then the first onData filter iteration would | ||
| // correctly iterate over the the filters and set latest, but on subsequent onData iterations |
alyssawilk
left a comment
There was a problem hiding this comment.
Again sorry for chiming in late. One request for test fixes and one question on the API plan
| * Allows modifying the decoding buffer. May only be called before any data has been continued | ||
| * past the calling filter. | ||
| */ | ||
| virtual void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) PURE; |
There was a problem hiding this comment.
Sorry, chiming in late.
If we're going to need a custom filter in order to buffer the whole body and then call modifyDecodingBuffer, would it be possible to refactor the buffering filter to have a stream complete callback and folks can subclass and implement the onBufferingFilterStreamComplete callback? Or better yet we could implement #5834, subclass the buffering filter and override the base class onEncode/DecodeComplete.
My concern is both adding extra complexity to the HCM for something I think we can push into an exiting filter, and I think by design it's subject to the data-with-end-stream problem called out below.
There was a problem hiding this comment.
This is a good point. @snowp if this makes sense to you I would be in favor of closing this and doing what @alyssawilk says?
| public: | ||
| Http::FilterDataStatus decodeData(Buffer::Instance&, bool end_stream) { | ||
| if (end_stream) { | ||
| decoder_callbacks_->modifyDecodingBuffer([](auto& buffer) { |
There was a problem hiding this comment.
As you call out in the integration test, this doesn't modify the entire request body when the data arrives with end stream. I suspect this generally won't be what the user wants - we've seen plenty of these bugs (e.g. #5674) where we need to
callbacks_->addDecodedData(data, true);
to capture data sent inline with end_stream
This means that the reference implementation of this function doesn't do what it says (and folks may copy it without realizing) and also that the design is subject to the data-with-final-end-stream pattern which I'd really love to avoid. Can we try to fix this?
|
|
||
| void modifyDecodingBuffer(std::function<void(Buffer::Instance&)> callback) override { | ||
| ASSERT(parent_.state_.latest_data_decoding_filter_ == this); | ||
| callback(*parent_.buffered_request_data_.get()); |
There was a problem hiding this comment.
Sorry for chiming in late, but this seems like a lot of complexity for an ASSERT. To me, this has similar functionality to just allowing raw buffer access - the filter can do arbitrary transforms on any data, and it'd be far simpler conceptually to just allow connections to access the buffer directly than add on the std::function complexity to get an ASSERT check.
I suspect Matt may disagree as he hasn't called it out so I'm fine letting it stand :-)
There was a problem hiding this comment.
Assuming we stick with this approach, I do personally think this assert is worth it, as well as the differentiation between const and non-const access, mainly because I think it would be very easy to get hard to understand behavior between filters. With that said, per your other comment, maybe we don't need this change at all?
There was a problem hiding this comment.
I think we probably need something, as the current buffering filter pushes the buffering into the HCM. We could refactor the buffering filter to do the buffering itself (which seems reasonable) and then subclass, but the extra work was why I am fine going either way.
There was a problem hiding this comment.
True, sorry, I wasn't thinking very clearly at the end of the day yesterday. As you point out the buffer filter as-is won't do it. I'm not in favor of changing how the buffer filter works mainly because we avoid double buffering in many cases. I.e., the buffer filter buffers, then some other filter buffers but it's a NOP because the HCM has already buffered the data.
I guess in thinking about it more, I'm back to being fine with this solution. Alyssa's concern about not handling end_stream is a good one, though I can't think of any elegant quick fix for that if we keep this general API flow. Any ideas?
There was a problem hiding this comment.
Yeah, sorry, I'd edited my comment in some window or other and it got eaten by GitHub.
I don't know if we can fix the end_stream thing here, but we can at least update our sample code to do the addDecodedData dance, and then do away with it if #5834 gets fixed. WDYT?
There was a problem hiding this comment.
I don't know if we can fix the end_stream thing here, but we can at least update our sample code to do the addDecodedData dance, and then do away with it if #5834 gets fixed. WDYT?
+1
There was a problem hiding this comment.
This all sounds good to me. I'll update the tests
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| public: | ||
| Http::FilterDataStatus decodeData(Buffer::Instance&, bool end_stream) { | ||
| Http::FilterDataStatus decodeData(Buffer::Instance& data, bool end_stream) { | ||
| decoder_callbacks_->addDecodedData(data, true); |
There was a problem hiding this comment.
With this addition, I think you want to return no buffer in the response code. Before @alyssawilk yells at me about this, I agree this is too complicated, and I will continue to think about how to make this better generically. :)
Same for the encode case.
There was a problem hiding this comment.
yeah that makes sense, updated
Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM will defer to @alyssawilk for final approval.
alyssawilk
left a comment
There was a problem hiding this comment.
Tests LGTM and I think the rest are all personal style preferences so Matt, you're good to merge.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Had to merge master so PTAL @alyssawilk @mattklein123 |
Adds a filter callback that allows modifying the encoding/decoding buffer. This is useful in allowing a filter to modify the buffer after seeing the entire buffer while still using the watermarked buffer maintained by the HCM. We only allow modifying the buffer from the latest filter that seen the request/response data. This ensures that we don't have multiple filters both making changes to the buffer at the same time. Signed-off-by: Snow Pettersen snowp@squareup.com Signed-off-by: Fred Douglas <fredlas@google.com>
Adds a filter callback that allows modifying the encoding/decoding buffer. This is useful in allowing
a filter to modify the buffer after seeing the entire buffer while still using the watermarked buffer
maintained by the HCM.
We only allow modifying the buffer from the latest filter that seen the request/response data. This
ensures that we don't have multiple filters both making changes to the buffer at the same time.
Signed-off-by: Snow Pettersen snowp@squareup.com
Risk Level: Medium due to changes to HCM
Testing: Integration tests
Docs Changes: n/a
Release Notes: Added release note
Fixes #5394