router: allow retry of streaming/incomplete requests#10725
router: allow retry of streaming/incomplete requests#10725ggreenway merged 14 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Greg Greenway <ggreenway@apple.com>
|
@mattklein123 We discussed this change briefly a few months ago, and I finally got around to it. Can you take a quick skim and see if I missed anything obvious, or any other high-level concerns about this? |
alyssawilk
left a comment
There was a problem hiding this comment.
Huh, I would have thought this would be way more complicated. Would definitely like you to test the heck out of this but looks good so far!
| buffering = false; | ||
| active_shadow_policies_.clear(); | ||
|
|
||
| // If we had to abandon buffering and there's no request in progress, abort the request and |
There was a problem hiding this comment.
Maybe a bit of commenting of how we might get here? is this just during the retry-timer interval?
Also cleanup -> clean up?
source/common/router/router.cc
Outdated
| ASSERT(upstream_requests_.size() == 1); | ||
| upstream_requests_.front()->encodeMetadata(std::move(metadata_map_ptr)); | ||
| if (!upstream_requests_.empty()) { | ||
| // TODO: buffer this if there's no request? It doesn't look like the upstream request has |
There was a problem hiding this comment.
router.h: TODO(soya3129): Save metadata for retry, redirect and shadowing case.
Fine to double it here since it added confusion for you
source/common/router/router.cc
Outdated
|
|
||
| if (retry_status == RetryStatus::Yes && setupRetry()) { | ||
| if (retry_status == RetryStatus::Yes) { | ||
| setupRetry(); |
There was a problem hiding this comment.
curious, did we double count stats here?
There was a problem hiding this comment.
Yes, I think so; I found this while investigating this a few months ago. See the Fixed issue in description.
|
Also cc @AndresGuedez as I think this was on one of your milestones |
Yeah, me too. I got this far and realized it seemed to mostly-work, and said to myself "That's it!?!"
Agreed, I just wanted to make sure I wasn't way off on the wrong path before writing extensive tests; I've never touched this area of the codebase before. |
mattklein123
left a comment
There was a problem hiding this comment.
Nice that this wasn't too hard in the end. Yes this looks good to me at a high level so let's circle back when the tests are done? Thank you!
/wait
…uest Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
|
FYI looks like legit CI failure. /wait |
|
/azp run envoy-presubmit |
|
Azure Pipelines successfully started running 1 pipeline(s). |
alyssawilk
left a comment
There was a problem hiding this comment.
Yay tests! This PR really came out cleanly. I owe a pass on the unit tests, but I had one question on the overall flow I think I'd like us to sort out first.
source/common/router/router.cc
Outdated
| config_.stats_.rq_retry_skipped_request_not_complete_.inc(); | ||
| return false; | ||
| } | ||
| void Filter::setupRetry() { |
There was a problem hiding this comment.
I wonder if we should rename, or just do this inline - it's not really setting up anything or performing the retry.
| // Overflow the request buffer limit. Because the retry base interval is infinity, no | ||
| // request will be in progress. This will cause the request to be aborted and an error | ||
| // to be returned to the client. | ||
| std::string data2(2048, 'b'); |
There was a problem hiding this comment.
I could see this for HTTP/2, where we do flow control via stream window, but only stop acking when we're at the limit, but for HTTP/1, we generally readDisable when we don't want more data. I'd think that if we're in a case where we were waiting on upstream, we'd want to readDisable (which would immediately cause data to cease) and we'd get the retry.
There was a problem hiding this comment.
I thought about this, and briefly looked into it, but I think it will be much more complicated. The problematic case is when the router has an upstream request, but we don't yet know whether it will need to be retried. The upstream may require more data than we can buffer to determine if it will return a 5xx or not.
Given that, we could stop flow control in the router when we do not have an upstream request. But then it's racy whether this behavior is in effect, because there are (at least) 3 states: no upstream request at all, an upstream request on a connection that isn't established yet, and an upstream request where we're sending the request currently.
So I think all that means it is out-of-scope for this PR. But worth pursuing at some point.
| EXPECT_EQ(host_address_, host->address()); | ||
| expectResponseTimerCreate(); | ||
|
|
||
| Http::TestRequestHeaderMapImpl headers{ |
There was a problem hiding this comment.
Do you think we should add this to the base test class, since so many tests use it?
There was a problem hiding this comment.
There are a lot of variations on the request headers to configure different things in these tests.
I thought about writing a base class or helper function for the several similar tests I'm adding in this PR, but it was becoming more confusing to follow what each test did, even though it reduced duplication. So I'm inclined to leave it as-is.
Signed-off-by: Greg Greenway <ggreenway@apple.com>
alyssawilk
left a comment
There was a problem hiding this comment.
LGTM (I'd still mildly prefer setupRetry be renamed but it's a minor nit and I'm out tomorrow =P)
I'll push a change with it removed in a few minutes (waiting for tests to run). I also found one other minor cleanup where a comment referring to setupRetry() was outdated/wrong. |
Move the debug log statement into doRetry(); it's more accurate there anyways. Minor cleanup of one function to put all stats-related code in a block together. Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
github is showing one of the commits plus a merge; there's another commit that it isn't showing here for some reason (but does show up correctly in the "commits" tab). |
|
/retest |
|
🔨 rebuilding |
Signed-off-by: Greg Greenway <ggreenway@apple.com> Signed-off-by: pengg <pengg@google.com>
Signed-off-by: Greg Greenway ggreenway@apple.com
Risk Level: Medium
Testing: New integration tests
Docs Changes:
Release Notes: added
Fixes: #10211