Skip to content

[core-amqp] Implement exponential retry mechanism#4175

Merged
ramya0820 merged 14 commits intoAzure:masterfrom
ramya0820:issue-3331-p2
Jul 11, 2019
Merged

[core-amqp] Implement exponential retry mechanism#4175
ramya0820 merged 14 commits intoAzure:masterfrom
ramya0820:issue-3331-p2

Conversation

@ramya0820
Copy link
Member

@ramya0820 ramya0820 commented Jul 3, 2019

For more context refer to #3331

Specifically, please see this comment #3331 (comment) for assumptions based on which this PR is submitted.
Enlisting here as well for reference-

  • When an operation fails, the delay is computed as (2^retry-attempts) milliseconds and is thus incremented with each failed attempt in order of (1 ms, 3ms, 7ms, so on..) This is same as what we have in core-http
  • The retries stop at attempt n where n is computed per equation
    minimum-retry-delay + 2^(n) <= maximum-allowed-retry-delay
    as the delay induced before this attempt 'n' would have reached the maximum allowed delay value governed by the configured maxExponentialRetryDelayInMilliseconds.
    In core-http the retries continue by capping max delay at the configured value and converts into a linear retry until either targeted retry attempt count is reached or abort signal is determined.
  • Overall, based on offline discussion and decisions, the computation of exponential retry delays is to mirror what we have in core-http and we will be going with introducing this new type of retry policy using a boolean flag on the RetryConfig as opposed to defining a new interface/type to RetryConfig.

@ramya0820 ramya0820 self-assigned this Jul 3, 2019
@ramya0820 ramya0820 changed the title Implement exponential retry mechanism [amqp-common] Implement exponential retry mechanism Jul 3, 2019
@ramya0820 ramya0820 changed the title [amqp-common] Implement exponential retry mechanism [core-amqp] Implement exponential retry mechanism Jul 5, 2019
@ramya0820
Copy link
Member Author

I'm looking to add some tests soon to this PR if we can confirm that general assumptions and approach seem okay
@bterlson @ramya-rao-a

const targetDelayInMs = Math.min(
config.minExponentialRetryDelayInMs + Math.pow(2, i),
config.maxExponentialRetryDelayInMs
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With default values for min and max delays here, this results in a progression of 3, 5, 9, 17, 33, 65, 129, 257, 513, 1025, 2049 etc.

Questions:

  • Shouldn't we be starting from the min value instead of min value + 2 ?
  • This method differs from both what is in core-http and in storage. Which of the two did we take as reference here?

Copy link
Member Author

@ramya0820 ramya0820 Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Shouldn't we be starting from the min value instead of min value + 2 ?

This mirrors what we have in core-http and would depend on the configured value - did we want to set the default min to 0 ?

  • This method differs from both what is in core-http and in storage. Which of the two did we take as reference here?

We had decided to use what we have in core-http as captured in #3331 (comment) as well, do we have consensus to go with one in storage instead? @bterlson @daviwil @sadasant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, if we were to investigate best approach to take forward and not mirror core-http as decided, then following findings may help -

Based on discussion from cloud computing communities online,

Slowing clients down may help, and the classic way to slow clients down is capped exponential backoff. Capped exponential backoff means that clients multiply their backoff by a constant after each attempt, up to some maximum value. In our case, after each unsuccessful attempt, clients sleep for:
exponential-backoff-and-jitter-blog-figure-3

This seems to mirror what we have in storage. We'll likely need to update core-http as well.
And also decide if we want to do away with concept of minimum delay and use a maximum cap value along with retry attempts instead.

This would address, simplify other comment too where instead of delayInSeconds and min/max expoenential retry specific config, we can have one maxRetryDelay and maxRetries that would apply to both types of retry.
Additionally the number 2 can be made configurable as well, calling it the backoffFactor. (This seems to be a thing as well if you search for exponential time delay calculator /"online-backoff-calculator" online)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #3331 (comment)

When an operation fails, the delay is computed as (2^retry-attempts) milliseconds and is thus incremented with each failed attempt in order of (1 ms, 3ms, 7ms, so on..) This is same as what we have in core-http

This is not what core-http is doing.

In core-http, (2^retry-attempts) is not used as the time itself. This value is used to multiply with the delay (after introducing a randomization factor) provided by the user. Please review https://github.com/Azure/azure-sdk-for-js/blob/%40azure/core-http_1.0.0-preview.1/sdk/core/core-http/lib/policies/exponentialRetryPolicy.ts#L131-L137 one more time.

Copy link
Member Author

@ramya0820 ramya0820 Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes missed the * in line 135.
This makes it overall similar to what we have in storage except for the use of minimum retry delay and boundedRandDelta computation.

Copy link
Member Author

@ramya0820 ramya0820 Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramya-rao-a @bterlson
I've updated the implementation.

Moreover, @sadasant and I just discussed offline about having one copy of retry policy implementation where possible.

Should we house the exponential delay computation in one place and export, use that in here?
Would be good if we can clarify on which computation to use, so we can consider just having it modularized and exported, say from core-http.

Copy link
Member Author

@ramya0820 ramya0820 Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, currently it isn't making sense to take on core-http as dependency. But eventually we could have a separate module for retry I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: For clarity now made sure it's mirroring core-http much closely - @ramya-rao-a. Please check updated PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest update looks much better and closer to what core-http is doing.

Sharing the retry code between core-http and core-amqp is not trivial at the moment. core-http operates on the model of a "pipeline". The pipeline consists of "policies". Each policy does what it needs to do and then passes the "request" to the next policy in the pipeline.

core-amqp does not have this concept of "pipeline". There are talks about making it have a pipeline, but that is not going to happen any time soon.

So, I am all in favor of aligning the retry logic between core-http and core-amqp, but not sharing actual re-usable code yet.

/**
* @property {number} [delayInSeconds] Amount of time to wait in seconds before making the
* next attempt. Default: 15.
* next attempt. Applicable only when performing linear retry. Default: 15.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use this instead of introducing minExponentialRetryDelayInMs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefits?
One is intended for with linear retry where min/max don't lend any meaning.
For exponential retry, we have notion of min and max, so that would likely be a con as it may be confusing to mix these into one. Unless maybe the naming is clear enough?
@bterlson thoughts on usage/name?

Copy link
Contributor

@ramya-rao-a ramya-rao-a Jul 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind. I was looking at the storage implementation where there is no "min" value and the delay for the first time is the same as delayInSeconds and was thinking that maybe we don't need to configure the minimum value.

I see that we do let user configure the min value, and that is fine by me

Copy link
Member

@bterlson bterlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved modulo @ramya-rao-a's concerns.

/**
* @property {string} connectionHost The host "<yournamespace>.servicebus.windows.net".
* Used to check network connectivity.
* Used to check network connectivity. Applicable only when performing linear retry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectionHost gets used regardless of linear or exponential retry.

* @property {boolean} [exponentialRetry] Flag to denote if we want to perform exponential retry and not
* the default, which is linear.
*/
exponentialRetry?: boolean;
Copy link
Contributor

@ramya-rao-a ramya-rao-a Jul 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have generally found it helpful to name properties/variables/arguments that are boolean to have a preceding verb like is or has. In this case, I would suggest isExponential.

@bterlson Since the RetryOptions passed when creating a EventHubClient would need this property too and so this is exposed to our users, any thoughts on the name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated to isExponentialRetry for now

i,
err
);
let targetDelayInMs: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can initialize targetDelayInMs with config.delayInSeconds to avoid the else block

export const defaultDelayBetweenOperationRetriesInSeconds = 5;
export const defaultDelayBetweenRetriesInSeconds = 15;
export const defaultMaxDelayForExponentialRetryInMs = 50000;
export const defaultMinDelayForExponentialRetryInMs = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we arrive at these defaults? Assuming that we wan't to mirror the retry policy in core-http, these should be different

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the defaults seemed different in general across these 2, such as for delay between retries to be same as well (5 vs 15 vs 30 seconds based on type of retry).
Fixed this as well to use default as 30 sec.


targetDelayInMs = Math.min(
config.minExponentialRetryDelayInMs + incrementDelta,
config.delayInSeconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be choosing between config.minExponentialRetryDelayInMs + incrementDelta and maxExponentialRetryDelayInMs not config.delayInSeconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes - fixed

* If `exponentialRetry` option is set, then the delay between retries is adjusted to increase
* exponentially with each attempt using back-off factor of power 2.
*
* @param {RetryConfig<T>} config Parameters that define what type of retry will be performed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters define more than just the type of retry. The previous statement was good enough i.e Parameters to configure retry operation

);
let targetDelayInMs: number;
if (config.exponentialRetry) {
let incrementDelta = Math.pow(2, i) - 1;
Copy link
Contributor

@ramya-rao-a ramya-rao-a Jul 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a quick glance, the words minExponentialRetryDelayInMs and maxExponentialRetryDelayInMs tell me that when using exponential retry policy, the delay between retries starts from the minExponentialRetryDelayInMs and is incremented exponentially till maxExponentialRetryDelayInMs

This understanding holds true only if incrementDelta is initialized to Math.pow(2, i - 1) - 1 instead of Math.pow(2, i) - 1

But, I am not sure if this what we want it to mean :)
i.e I am not sure if the first delay interval should be minExponentialRetryDelayInMs or minExponentialRetryDelayInMs + (a random value between 0 and 1 * delayInSeconds)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolution to this is not a blocker for merging this PR, but is a blocker in closing the linked issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your intuition here - the first time we retry, incrementDelta should be zero, and since we start on "try 1", Math.pow(2, i - 1) - 1 makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case minExponentialRetryDelayInMs would need a better default ... Logged #4201 to discuss this

/**
* @property {number} [delayInSeconds] Amount of time to wait in seconds before making the
* next attempt. Default: 15.
* When `exponentialRetry` is set to `true`, this is used to compute the exponentially increasing delays between retries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exponentialRetry -> isExponentialRetry

Same in a few other places where the old name is being used in comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, re-addressed :)

* @property {boolean} [exponentialRetry] Flag to denote if we want to perform exponential retry and not
* the default, which is linear.
*/
isExponentialRetry?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bterlson Do you foresee any other kind of retry we would be supporting? If so, then we should probably have this as an enum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, given that there are 3 or more supported retry schemes in core-http (linear, exponential, logarithmic, also see Fibonacci and a few others looking around outside Azure).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a single interface RetryOptions cater to the different kinds of inputs required for the different retry mechanisms?
In case of libraries using the http pipeline, each retry mechanism is a different "policy", so they won't have this problem.
Logged #4199 to continue this discussion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use enum as agreed

How can a single interface RetryOptions cater to the different kinds of inputs required for the different retry mechanisms?

Agree I recall touching upon this point during initial discussion and had captured it in the assumptions as well

);
let targetDelayInMs: number;
if (config.exponentialRetry) {
let incrementDelta = Math.pow(2, i) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let incrementDelta = Math.pow(2, i) - 1;
const boundedRandDelta =
config.delayInSeconds * 0.8 +
Math.floor(Math.random() * (config.delayInSeconds * 1.2 - config.delayInSeconds * 0.8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't (config.delayInSeconds * 1.2 - config.delayInSeconds * 0.8) the same as configDelayInSeconds * 0.4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe the magic numbers/factors have a certain meaning? For instance we have time in milliseconds (say for 30 sec) being expressed as 1000 * 30 in some places for readability.

);
let targetDelayInMs: number;
if (config.exponentialRetry) {
let incrementDelta = Math.pow(2, i) - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your intuition here - the first time we retry, incrementDelta should be zero, and since we start on "try 1", Math.pow(2, i - 1) - 1 makes sense to me.

Copy link
Contributor

@ramya-rao-a ramya-rao-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramya0820 The changes look good. Can we also update the tests in retry.spec.ts to get coverage for the exponential retry? I believe all the tests in there would be applicable for the exponential case.

Refer to the use forEach in errros.spec.ts where the same test is run for different inputs. We can use this model to update the current tests to run for both retry types

@ramya0820
Copy link
Member Author

Can we also update the tests in retry.spec.ts to get coverage for the exponential retry? I believe all the tests in there would be applicable for the exponential case.

Refer to the use forEach in errros.spec.ts where the same test is run for different inputs. We can use this model to update the current tests to run for both retry types

@ramya-rao-a
I've updated the tests to be extended to exponential retry.
Overall, other things we could improve maybe are

[RetryPolicy.ExponentialRetryPolicy, RetryPolicy.LinearRetryPolicy].forEach(
() =>
function(retryPolicy) {
describe(`retry function for ${retryPolicy}`, function() {
Copy link
Contributor

@ramya-rao-a ramya-rao-a Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont this print 0 and 1 because retry policy is an enum? That wouldnt help much. The words "fixed"/"linear" and "exponential" are preferable.
Would suggest in having a constant with the required value and then use it in the describe title

},
connectionId: "connection-1",
operationType: RetryOperationType.cbsAuth,
delayInSeconds: retryPolicy === RetryPolicy.ExponentialRetryPolicy ? 0 : 15,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0 in case of exponential? That will result in no exponential increase at all right?
I would imagine that we don't have to make any changes to the delayInSeconds property in any of the tests

@ramya0820 ramya0820 merged commit 489fd56 into Azure:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants