router: generic timeout on request start#14003
router: generic timeout on request start#14003yuval-k wants to merge 3 commits intoenvoyproxy:masterfrom
Conversation
… created. (envoyproxy#12778)" This reverts commit dfa85e8. Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks for taking on the more general fix here. I'll defer to folks who understand router timeout details better than me on the implementation, but one thing that I think would be useful is some e2e tests that show that both Envoy and Google gRPC clients can have the same deadline set and enforced.
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for fixing this. At a high level this looks on the right track!
/wait
include/envoy/router/router.h
Outdated
| virtual std::chrono::milliseconds timeout() const PURE; | ||
|
|
||
| /** | ||
| * @return bool measure timeout() from when the request is started, instead of when it is |
There was a problem hiding this comment.
Can you be more specific about what "started" means? Does this mean after a connection is received from the connection pool? Or before a connection is requested from the pool?
source/common/router/router.cc
Outdated
| } | ||
|
|
||
| void Filter::finalizeTimeoutHeaders() { | ||
| // if we have a deadline, update the timeout headers. |
There was a problem hiding this comment.
All comments start with capital, proper grammar, etc. Here and elsewhere.
source/common/router/router.cc
Outdated
| // TODO: I'm not sure if this check is needed. can this method still be called | ||
| // if onResponseTimeout was called? |
There was a problem hiding this comment.
No, once the reply is sent and/or reset this can't happen. Same below.
source/common/router/router.cc
Outdated
| void Filter::onResponseTimeout() { | ||
| ENVOY_STREAM_LOG(debug, "upstream timeout", *callbacks_); | ||
|
|
||
| timedout_ = true; |
source/common/router/router.cc
Outdated
| // start measuring timeout. | ||
| response_timeout_ = dispatcher.createTimer([this]() -> void { onResponseTimeout(); }); | ||
| response_timeout_->enableTimer(timeout_.global_timeout_); |
There was a problem hiding this comment.
I would make a common method for starting the response timeout timer for clarity.
source/common/router/router.cc
Outdated
| // this _should_ be positive | ||
| // TODO: what should be done if it is not positive? as i believe this can happen in rare | ||
| // cases. | ||
| ASSERT(expected_timeout.count() > 0, "expected_timeout is not positive"); |
There was a problem hiding this comment.
Yeah I think it's possible for this to be delayed. In this case though I think the timeout timer should fire so we can just return here? Make sure this case has coverage.
source/common/router/router.cc
Outdated
| // if it is less than the current per try timeout, or there is no per try timeout, update | ||
| // timeout headers | ||
| if (timeout_.per_try_timeout_.count() == 0 || expected_timeout < timeout_.per_try_timeout_) { | ||
| FilterUtility::setTimeoutHeaders(*route_entry_, expected_timeout.count(), *downstream_headers_, |
There was a problem hiding this comment.
Even in cases where we start measuring in request complete, shouldn't we still adjust expected timeout HTTP and gRPC time based on how long it took to get a connection from the CP? I think this has been raised elsewhere and it would be good to fix this also?
There was a problem hiding this comment.
i can adjust; my intention was to prevent a change in behavior to reduce risk
There was a problem hiding this comment.
I think it would be fine to do this as a separate change, just something to keep in mind.
Signed-off-by: Yuval Kohavi <yuval.kohavi@gmail.com>
|
/wait |
|
Hi @mattklein123 , I think I miss-understood the original behavior of the async client; I see now that the whole request is sent at the same time, unless the circuit breaker hits, and then it is rejected immediately; i.e. the reason I didn't originally see timeouts is because the request was already rejected, and not because it was pending. This means I can take this PR in one of two ways:
I apologize for this confusion, and will make sure to do better research next time. |
|
@yuval-k I think my preference is to just revert the original PR and we can add (2) later if someone asks? Thank you! I'm going to close this out and can you open a fresh revert PR please? Thank you! |
|
xref #14152 |
Commit Message:
Allow measuring request timeout on request start when using the async client.
Additional Description:
This PR fixes #13580, by first reverting #12778, and then re-adding the functionality in generic way to the router filter.
Opening a draft PR to get early feedback that this is the right direction; once we agree I will add tests and modify the async client / ext auth client to use the new functionality (hopefully in a backwards compatible way).
For the reviewer: The first commit is just a revert, and the second commit adds the new functionality.
Risk Level: Medium (touches the router filter; but changes are be opt-in)
Testing: TBD
Docs Changes: TBD
Release Notes: TBD
Runtime guard: TBD
Deprecates: TBD
Fixes: #13580
cc @mattklein123 @htuch .