config: backoff strategy implementation#3758
Conversation
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@htuch @mattklein123 can you PTAL? |
htuch
left a comment
There was a problem hiding this comment.
Thanks @ramaraochavali for adding this!
| return current_interval_; | ||
| } | ||
|
|
||
| uint64_t ExponentialBackOffStrategy::multiplyInterval() { |
There was a problem hiding this comment.
Does this need to be a standalone method? Seems simple enough to inline. OTOH, if you want to later make this more sophisticated (see comment below), it should be standalone (just not named multiple..).
There was a problem hiding this comment.
I will inline it for now and refactor it based on how random exponential is implemented in my follow-up PR
| uint64_t ExponentialBackOffStrategy::multiplyInterval() { | ||
| uint64_t temp_interval = current_interval_; | ||
| temp_interval *= multiplier_; | ||
| return (temp_interval > max_interval_ ? max_interval_ : temp_interval); |
There was a problem hiding this comment.
Do we want any randomness in the backoff to desynchronize retries?
There was a problem hiding this comment.
I will follow-up with a PR to introduce randomness based backoff. Then I will see if new class makes sense or refactor this. Till that time I will keep this as is.
| const std::uint32_t multiplier) | ||
| : initial_interval_(initial_interval), max_interval_(max_interval), multiplier_(multiplier), | ||
| current_interval_(0) { | ||
| ASSERT(!(multiplier_ <= 0)); |
There was a problem hiding this comment.
Should multiplier be a double? I.e. do we want to force the minimum backoff to be 2x?
There was a problem hiding this comment.
We could change it to double. If we change it to double, I think minimum backoff can be 1.5 so I will change this like that. Is that fine?
| /** | ||
| * Generic interface for all backoff strategy implementations. | ||
| */ | ||
| class BackOffStrategy { |
There was a problem hiding this comment.
This can probably live in include/envoy since it is a pure interface.
| /** | ||
| * Returns the next backoff interval. | ||
| */ | ||
| virtual std::uint64_t nextBackOff() PURE; |
There was a problem hiding this comment.
Just uint64_t rather than std::uint64_t.
| namespace Envoy { | ||
|
|
||
| ExponentialBackOffStrategy::ExponentialBackOffStrategy(const std::uint64_t initial_interval, | ||
| const std::uint64_t max_interval, |
There was a problem hiding this comment.
I think it would be good to remove the std:: prefix on all the numeric types in this PR.
source/common/config/grpc_mux_impl.h
Outdated
|
|
||
| // TODO(htuch): Make this configurable or some static. | ||
| const uint32_t RETRY_DELAY_MS = 5000; | ||
| const uint32_t RETRY_INITIAL_DELAY_MS = 5000; |
There was a problem hiding this comment.
Now that we have sensible backoff, I think you should shrink this down to something very small for the initial retry, e.g. 500ms. This will fix the problem that Istio are seeing (@costinm) in some cases where Pilot isn't ready. The exponential backoff will take us to a large number pretty quickly.
There was a problem hiding this comment.
Yes. I will change
| RetryStateImpl::~RetryStateImpl() { resetRetry(); } | ||
|
|
||
| void RetryStateImpl::enableBackoffTimer() { | ||
| // TODO(ramaraochavali): Implement JitteredExponentialBackOff and refactor this. |
There was a problem hiding this comment.
Ah, I see, you have thought about jitter here..
There was a problem hiding this comment.
+1, and, honestly, you should be using jittered backoff for the management connection also. I would not support any backoff policy that is not jittered...
There was a problem hiding this comment.
@mattklein123 i am ok with that. Let us review the PR for jittered backoff #3791 and I can move management connection to use this. @htuch thoughts?
There was a problem hiding this comment.
Yeah, I think have jitter as default (and only built-in) scheme makes sense, best not to provide sharp objects that aren't needed.
source/common/config/grpc_mux_impl.h
Outdated
| // TODO(htuch): Make this configurable or some static. | ||
| const uint32_t RETRY_DELAY_MS = 5000; | ||
| const uint32_t RETRY_INITIAL_DELAY_MS = 5000; | ||
| const uint32_t RETRY_MAX_DELAY_MS = 120000; // Do not cross more than 2 mins |
There was a problem hiding this comment.
Seems too large IMHO for default, 2 minutes is a long time.. How about 30s?
There was a problem hiding this comment.
I am fine. Just put some number so that we can discuss
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@htuch addressed all the comments. PTAL. |
|
@htuch when you get chance can you PTAL? |
htuch
left a comment
There was a problem hiding this comment.
Looks good, thanks, a few more comments/questions.
| const double multiplier) | ||
| : initial_interval_(initial_interval), max_interval_(max_interval), multiplier_(multiplier), | ||
| current_interval_(0) { | ||
| ASSERT(multiplier_ >= 1.5); |
There was a problem hiding this comment.
do you want it to be more than 2? I thought people can use 1.5 as multiplier as well. that is why I modified this.
There was a problem hiding this comment.
I would suggest >= 1.0; 1.125 would be a valid value, for example.
There was a problem hiding this comment.
ok. I think we discussed earlier that should be more than 1.0. So would validate against >1.0
|
|
||
| namespace Envoy { | ||
|
|
||
| ExponentialBackOffStrategy::ExponentialBackOffStrategy(const uint64_t initial_interval, |
There was a problem hiding this comment.
We generally don't make scalar args consts in Envoy (although it's not a terrible idea)..
| } | ||
|
|
||
| TEST(BackOffStrategyTest, ExponentialBackOfReset) { | ||
| ExponentialBackOffStrategy exponential_back_off(10, 100, 2); |
There was a problem hiding this comment.
Can you do some tests with a fractional backoff multiplier?
| : node_(node), async_client_(std::move(async_client)), service_method_(service_method), | ||
| time_source_(time_source) { | ||
| retry_timer_ = dispatcher.createTimer([this]() -> void { establishNewStream(); }); | ||
| backoff_strategy_ptr_ = std::make_unique<ExponentialBackOffStrategy>( |
There was a problem hiding this comment.
Does this affect any existing tests? If not, how come?
There was a problem hiding this comment.
It did not actually.
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@htuch addressed the changes. PTAL.. |
htuch
left a comment
There was a problem hiding this comment.
Looks good modulo final comments.
|
|
||
| uint64_t initial_interval_; | ||
| uint64_t max_interval_; | ||
| double multiplier_; |
There was a problem hiding this comment.
Nit: a bunch of these should be const as they are only set in the constructor.
| virtual ~BackOffStrategy() {} | ||
|
|
||
| /** | ||
| * Returns the next backoff interval. |
There was a problem hiding this comment.
Please use Doxygen @return here. Also, please express the units.
| /** | ||
| * Returns the next backoff interval. | ||
| */ | ||
| virtual uint64_t nextBackOff() PURE; |
There was a problem hiding this comment.
Suggest renaming to nextBackoffMilliseconds or nextBackoffMs.
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@htuch addressed. PTAL. |
Signed-off-by: Rama rama.rao@salesforce.com
Description:
Implements an
ExponentialBackOffStrategythat is used when Envoy retries management server connection.Risk Level: Low
Testing: Added automated tests
Docs Changes: N/A
Release Notes: N/A
Fixes #3737