Skip to content

caching: Stream cached responses in chunks and handle downstream backpressure #13054

Closed
yosrym93 wants to merge 10 commits intoenvoyproxy:masterfrom
capoferro:downstream-backpressure
Closed

caching: Stream cached responses in chunks and handle downstream backpressure #13054
yosrym93 wants to merge 10 commits intoenvoyproxy:masterfrom
capoferro:downstream-backpressure

Conversation

@yosrym93
Copy link
Contributor

Commit Message:
The CacheFilter now streams cached responses in chunks and handles downstream backpressure.
Signed-off-by: Yosry Ahmed yosryahmed@google.com

Additional Description:
The CacheFilter now:

  • Does not stop headers encoding until the cached response is fetched.
  • Streams the cached response downstream in chunks, according to the encoding buffer size.
  • Subscribes to downstream watermark events and handles downstream backpressure by stopping fetching the response body from the cache.

Risk Level: Low
Testing: Unit / Integration
Docs Changes: N/A
Release Notes: N/A
Fixes #9835

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
…ized according to the buffer limit

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@yosrym93
Copy link
Contributor Author

Background:

  • When the CacheFilter served validated cached responses during encoding, it used to stop headers iteration until the cached body is fetched. This incurs an unnecessary delay to headers encoding.
  • The CacheFilter used to fetch the whole cached response from the cache in one go and send it downstream. This may cause delays and buffer overflows if the cached response is large.
  • The CacheFilter did not handle backpressure from downstream.

In this PR:

  1. The CacheFilter does not stop headers iteration when encoding a cached response, made possible by http: Allow http filters to add a body that's not readily available to a headers-only request/response #12832.
  2. The CacheFilter fetches the body from the cache in several chunks, the chunk size is based on the encoding buffer size to avoid overflows.
  3. The CacheFilter subscribes to downstream watermark events and uses them to handle downstream backpressure, by stopping fetching body chunks when the high watermark callback is invoked, and resuming when the low watermark callback is invoked.

1 & 2 are already implemented and tested.
3 is implemented and was tested manually (by adding trace statements and running the integration test), but is still missing actual unit/integration tests. To test this, we need to make sure that when the high watermark callback is invoked we do not fetch more body chunks from the cache (maybe check that encodeData / injectEncodedDataToFilterChain were not called?), and that when the low watermark callback is invoked, we continue fetching data. This already implicitly happens in the integration test, but we need to make sure it is invoked correctly (by unit and integration tests).

@yosrym93
Copy link
Contributor Author

yosrym93 commented Sep 11, 2020

To subscribe to receive watermark events we need to call StreamDecoderCallbacks::addDownstreamWatermarkCallbacks(), and give it an object of DownstreamWatermarkCallbacks.The passed DownstreamWatermarkCallbacks object should implement onAboveWriteBufferHighWatermark and onBelowWriteBufferHighWatermark to handle downstream watermark events (in this case, stop and resume fetching body from the cache).

Currently, the CacheFilter inherits from DownstreamWatermarkCallbacks, and calls addDownstreamWatermarkCallbacks(*this) in encodeCachedResponse, before it starts fetching the cached body. We should make sure this is the correct place to call it.

Alternatively, the CacheFilter could create an internal class/struct that inherits from DownstreamWatermarkCallbacks and pass it to addDownstreamWatermarkCallbacks.

@htuch htuch requested a review from toddmgreer September 11, 2020 00:20
@yosrym93
Copy link
Contributor Author

The chunk size used to fetch the body from the cache is equal to the encoding buffer limit. This way, we fetch as much data in every request to the cache as we can send downstream.

@yosrym93
Copy link
Contributor Author

If the high watermark callback was invoked AFTER a body chunk is already request from the cache, this request will be completed, and that chunk will be sent downstream. This may result in crossing the encoding buffer limit, but it should be okay as this is only a soft limit, and this should not happen frequently.

@yosrym93
Copy link
Contributor Author

According to https://github.com/envoyproxy/envoy/blob/master/test/integration/protocol_integration_test.cc#L1036 we don't need to handle non-compliant 304 bodies. Envoy already handles them.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@yosrym93
Copy link
Contributor Author

@toddmgreer I think this addresses all your comments.

dir
dirname
djb
dont
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "dont" added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment that has ContinueAndDontEndStream. "Dont" is seen as a spelling error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a spelling error. "Don't"

@mattklein123 mattklein123 self-assigned this Sep 11, 2020
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
toddmgreer
toddmgreer previously approved these changes Sep 11, 2020
@mattklein123
Copy link
Member

@yosrym93 is this still a draft or is it ready for review? I can help review the watermark logic (or @alyssawilk potentially) but want to make sure this is ready for review.

/wait-any

@yosrym93
Copy link
Contributor Author

@yosrym93 is this still a draft or is it ready for review? I can help review the watermark logic (or @alyssawilk potentially) but want to make sure this is ready for review.

The PR is only missing tests for the watermark logic, but the logic code should be complete.
You can review it now or wait after tests are added, whichever you see fit.

@mattklein123
Copy link
Member

I will wait for the tests, thanks.

/wait

zyshi added a commit to zyshi/envoy that referenced this pull request Oct 22, 2020
Patching envoyproxy#13054

Signed-off-by: Zhongyi Shi <zhongyi@chromium.org>
…essure

Signed-off-by: Zhongyi Shi <zhongyi@chromium.org>
@mattklein123
Copy link
Member

@yosrym93 is this ready for review? If so can you mark it non-draft?

/wait-any

@zyshi
Copy link

zyshi commented Oct 29, 2020

I added a test in cache_filter_test.cc which should test the downstream watermark handling. Should be ready for review now.

A side note: it looks like the test dispatcher once run, cannot be interrupted (unless I miss something). It would be nice if there's a mode to run the event loop one by one, something similar to chromium's TestTestRunner::RunNextTask(). That made me wonder how Envoy test complicated scheduling logic, i.e., verifying callbacks are posted with right delay.

@zyshi
Copy link

zyshi commented Oct 29, 2020

@yanavlasov could you help set this PR ready for review? I don't have permission to do that. Besides, I might also need you to kick off the retry on presubmit to see if infra failures goes away. Thanks!

@yosrym93 If you happen to see this message early, it would be nice to set the PR ready for review. Thanks!

@yosrym93 yosrym93 marked this pull request as ready for review October 30, 2020 02:41
@yosrym93 yosrym93 requested a review from jmarantz as a code owner October 30, 2020 02:41
@yosrym93
Copy link
Contributor Author

@yosrym93 If you happen to see this message early, it would be nice to set the PR ready for review. Thanks!

Done. Glad to see progress on this PR!

@jmarantz jmarantz self-assigned this Oct 30, 2020
@jmarantz
Copy link
Contributor

Happy to review; please ping once @toddmgreer finishes review. Thanks!

@yanavlasov
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@zyshi
Copy link

zyshi commented Oct 30, 2020

@toddmgreer the change is ready for review, PTAL, thanks!

toddmgreer
toddmgreer previously approved these changes Nov 3, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically looks great; need to do another quick pass over the tests.

Some minor nits.


// Make sure validation conditional headers are added
const Http::TestRequestHeaderMapImpl injected_headers = {
{"if-none-match", etag_}, {"if-modified-since", response_last_modified_}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simple and verbose, is that okay to keep it as is rather than using values in headers.h? The dic needs std::string pairs, so we will need to use something like Http::CustomHeaders.get().ifNoneMatch.get(), which is not clean.

Besides, I checked other places where TestRequestHeaderMapImpl, it seems we generally follow a explicit key-value pair setting. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK either way. One way to make this a little less verbose than what you suggest is to make a temp for Http::CustomHeaders.get() and use that throughout this function. But it's up to you.

Signed-off-by: Zhongyi Shi <zhongyi@chromium.org>
@zyshi
Copy link

zyshi commented Nov 4, 2020

Comments addressed, PATL, thanks!

Sorry if I didn't figure out the best way to use the tool (bear with me if the comments notifications are not bundled). I am still trying to get familiar with this git native PR process (compared to a extremely convenient Gerrit workflow).

@zyshi
Copy link

zyshi commented Nov 4, 2020

@jmarantz This change is ready for review. BTW, I don't have permission to resolve conversations, please do resolve if you have that button. Thanks!


// Make sure validation conditional headers are added
const Http::TestRequestHeaderMapImpl injected_headers = {
{"if-none-match", etag_}, {"if-modified-since", response_last_modified_}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK either way. One way to make this a little less verbose than what you suggest is to make a temp for Http::CustomHeaders.get() and use that throughout this function. But it's up to you.

EXPECT_CALL(encoder_callbacks_,
injectEncodedDataToFilterChain(
testing::Property(&Buffer::Instance::toString,
testing::Eq(std::string(buffer_limit_, 'a'))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this testcase. I think it would give us more confidence in the correctness of the system if you used a buffer_limit of something much smaller, say 6. Then you can use a payload of "1234567890" and make sure the first chunk is "123456" and the second is "7890".

Right now you are comparing against a bunch of 'a's and I'm wondering if a bug could creep in where we take the wrong span of that string in the logic, but your test doesn't care.

WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks for doing a thorough review. I didn't pay much attention on the existing code but focused on downstream watermark handling test :P

// Test that a body with size exactly equal to the buffer limit will be encoded in 1 chunk.
TEST_F(CacheChunkSizeTest, EqualBufferLimit) {
request_headers_.setHost("EqualBufferLimit");
const std::string body = std::string(buffer_limit_, 'a');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again I'd consider using smaller limits and more interesting strings to compare.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this is great code, super clean and easy to read. Thanks for doing this!

I'm really just having nits and testing questions that are probably academic.

Signed-off-by: Zhongyi Shi <zhongyi@chromium.org>
@zyshi
Copy link

zyshi commented Nov 5, 2020

Thanks for the thorough review, really appreciate it! It was @yosrym93 did the great work, I only added one additional test to ensure there's test coverage on downstream watermark handling. LMK if there's any other concern. Thanks!

@yosrym93
Copy link
Contributor Author

yosrym93 commented Nov 5, 2020

Thanks @zyshi for completing this and @jmarantz for reviewing it. Happy to see the Cache Filter one step closer to being production-ready!

request_headers_.setHost("DownstreamPressureHandling");
const int chunks_count = 3;
const uint64_t body_size = buffer_limit_ * chunks_count;
const uint64_t body_size = getBufferLimit() * chunks_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still just testing a bunch of "a"s below; should we change these too?

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo my previous comment I'll pass this to @envoyproxy/senior-maintainers in parallel with possibly addressing that.

Nice job (all who worked on it)!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. A few questions/comments to get started.

/wait

Comment on lines +487 to +488
// TODO(yosrym93): Make sure this is the right place to add the callbacks.
decoder_callbacks_->addDownstreamWatermarkCallbacks(*this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend initializing this as early as possible (in setEncoderFilterCallbacks)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you really meant setEncoderFilterCallbacks rather than a typo here. Since I'm stepping up on the implementation code, I did get confused on why the registration is only provided available via StreamDecoderFilterCallbacks. What if a pure encoder filter (who doesn't have decoder_callbacks) wants to listen to downstream watermark changes? In other words, there's no way you could call: encoder_callbacks_->addDownstreamWatermarkCallbacks(*this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just an implementation oversight because no one has needed it before.

bool request_allows_inserts_ = false;

// These are used to keep track of whether we should fetch more data from the cache.
int high_watermark_calls_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unsigned

dir
dirname
djb
dont
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a spelling error. "Don't"

Comment on lines +93 to +95
// Continue encoding the headers but do not end the stream as the response body is yet to be
// injected.
return Http::FilterHeadersStatus::ContinueAndDontEndStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this correctly handle the case where the cached response is a header only response?

Comment on lines -116 to -119
if (filter_state_ == FilterState::EncodeServingFromCache) {
// Stop the encoding stream until the cached response is fetched & added to the encoding stream.
return Http::FilterDataStatus::StopIterationAndBuffer;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? Can this no longer happen? Should the enum be removed?

Comment on lines +128 to +135
void CacheFilter::onBelowWriteBufferLowWatermark() {
ASSERT(high_watermark_calls_ > 0);
--high_watermark_calls_;
if (!remaining_ranges_.empty()) {
// Fetching the cached response body was stopped, continue if possible.
maybeGetBody();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases in which we stream through a router response and this might get triggered? I don't see any guards in maybeGetBody() to confirm that we are actually doing a fetch?

@github-actions
Copy link

github-actions bot commented Dec 9, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 9, 2020
@github-actions
Copy link

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stalebot believes this issue/PR has not been touched recently waiting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CacheFilter: Handle backpressure from downstream

7 participants