docs: clarify behavior of hedge_on_per_try_timeout#12983
docs: clarify behavior of hedge_on_per_try_timeout#12983htuch merged 11 commits intoenvoyproxy:masterfrom
Conversation
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
@snowp I had a hard time understanding the original phrasing so I took a stab at rewriting it, but I'm not really sure I'm describing how it actually works. |
c27e3bc to
1582210
Compare
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
bump @mpuncel |
snowp
left a comment
There was a problem hiding this comment.
Thanks for improving the docs!
I think you'll also want to update the V3 docs, V2 is on its way out. The V4alpha docs should then be automatically updated once you run the proto_format.sh script.
There was a problem hiding this comment.
I don't think this part is right, retry policy doesn't matter when per try timeout is hit, only whether hedge_on_per_try_timeout is set
There was a problem hiding this comment.
reference comment:
RetryStatus RetryStateImpl::shouldHedgeRetryPerTryTimeout(DoRetryCallback callback) {
// A hedged retry on per try timeout is always retried if there are retries
// left. NOTE: this is a bit different than non-hedged per try timeouts which
// are only retried if the applicable retry policy specifies either
// RETRY_ON_5XX or RETRY_ON_GATEWAY_ERROR. This is because these types of
// retries are associated with a stream reset which is analogous to a gateway
// error. When hedging on per try timeout is enabled, however, there is no
// stream reset.
return shouldRetry(true, callback);
}
(the comment wording there is also confusing!)
There was a problem hiding this comment.
Interesting. I was actually working off my experiencing using this feature: I only started seeing (hedged) retries once I set x-envoy-retry-on.
There was a problem hiding this comment.
Looking at this code, it seems that retries_remaining_ would only get initialized to a non-zero value if retry_on_ is set:
envoy/source/common/router/retry_state_impl.cc
Lines 121 to 127 in 2709b6b
Thus shouldRetry would end up returning RetryStatus::NoRetryLimitExceeded for the hedged requests.
I can imagine this wasn't intended, since it surprised me too, but on the other hand it's congruent with how non-hedged retrying behaves.
There was a problem hiding this comment.
I think this might be clearer stated as
"Any response received after the timeout and subsequent hedge attempt will never be retried, no matter the RetryPolicy"
There was a problem hiding this comment.
Assuming timeout 150ms, per-try timeout of 50ms, 3 retries and retry-on: 5xx policy, and hedging enabled:
0ms: Request 1 sent.
50ms: Request 1 times out, (hedged) request 2 sent.
75ms: Request 2 (hedged) returns 500.
150ms: Request 1 times out.
Would there be a 3rd request?
(for simplicity assuming no exponential backoff in those timings)
There was a problem hiding this comment.
there would be a 3rd request. Request 2 is considered a new attempt, so it will be retried if it times out or returns a 500. If request 1 comes back with a 500 after request 2 has already been sent, that will be dropped and not retried
There was a problem hiding this comment.
ahh, I read (Any response received after the timeout) AND (subsequent hedge attempt) -> will never be retried 🤦
There was a problem hiding this comment.
i don't think this is true, you don't need to have gateway-error to have it be retried. A per try timeout is always retried when hedging is enabled
There was a problem hiding this comment.
I tested it and found it to be true. (In fact, wasted quite some time trying to understand why hedging didn't work for me before I tried adding a simple x-envoy-retry-on to my calls.).
I actually wrote this above in reply to a similar question you raised:
Looking at this code, it seems that retries_remaining_ would only get initialized to a non-zero value if retry_on_ is set:
envoy/source/common/router/retry_state_impl.cc
Lines 121 to 127 in 2709b6b
Thus shouldRetry would end up returning RetryStatus::NoRetryLimitExceeded for the hedged requests.
I can imagine this wasn't intended, since it surprised me too, but on the other hand it's congruent with how non-hedged retrying behaves.
There was a problem hiding this comment.
interesting! I do think that isn't intentional. In that case I'm not really sure how to word the comment, maybe you could say "you must have a RetryPolicy that retries at least one error code and specify the max number of retries". You don't have to have gateway-error specifically though it looks like from that code snippet
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
|
@ikonst I think this change looks good now. Could you a) merge master b) fix the formatting issue and c) apply the same change to the V3 docs (then run Also friendly reminder that force pushing breaks the reviewing flow for many, so avoid it if you can. Thanks! |
|
I'm not a fan of rebases either since they break the already-reviewed / new-commits separation. (Perhaps I did this one because I forgot some sign-offs mid-way?) |
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
|
/retest |
|
Retrying Azure Pipelines: |
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: Ilya Konstantinov <ilya.konstantinov@gmail.com>
|
@snowp ^ |
snowp
left a comment
There was a problem hiding this comment.
LGTM, thanks!
@envoyproxy/api-shepherds for API review and V2 sign off
|
/lgtm v2-freeze |
* master: (70 commits) upstream: avoid reset after end_stream in TCP HTTP upstream (envoyproxy#14106) bazelci: add fuzz coverage (envoyproxy#14179) dependencies: allowlist CVE-2020-8277 to prevent false positives. (envoyproxy#14228) cleanup: replace ad-hoc [0, 1] value types with UnitFloat (envoyproxy#14081) Update docs for skywalking tracer (envoyproxy#14210) Fix some errors in the switch statement when decode dubbo response (envoyproxy#14207) Windows: enable tests and envoy-static.exe pdb file (envoyproxy#13688) http: add Kill Request HTTP filter (envoyproxy#14170) dependencies: fix release_dates error behavior. (envoyproxy#14216) thrift filter: support skip decoding data after metadata in the thrift message (envoyproxy#13592) update cares (envoyproxy#14213) docs: clarify behavior of hedge_on_per_try_timeout (envoyproxy#12983) repokitteh: add support for randomized auto-assign. (envoyproxy#14185) [grpc] validate grpc config for illegal characters (envoyproxy#14129) server: Return nullopt when process_context is nullptr (envoyproxy#14181) [Windows] Fix thrift proxy tests (envoyproxy#13220) kafka: add missing unit tests (envoyproxy#14195) doc: mention gperftools explicitly in PPROF.md (envoyproxy#14199) Removed `--use-fake-symbol-table` option. (envoyproxy#14178) filter contract: clarification around local replies (envoyproxy#14193) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
No description provided.