[core-amqp] Implement exponential retry mechanism by ramya0820 · Pull Request #4175 · Azure/azure-sdk-for-js

ramya0820 · 2019-07-03T20:33:05Z

For more context refer to #3331

Specifically, please see this comment #3331 (comment) for assumptions based on which this PR is submitted.
Enlisting here as well for reference-

When an operation fails, the delay is computed as (2^retry-attempts) milliseconds and is thus incremented with each failed attempt in order of (1 ms, 3ms, 7ms, so on..) This is same as what we have in core-http
The retries stop at attempt n where n is computed per equation
minimum-retry-delay + 2^(n) <= maximum-allowed-retry-delay
as the delay induced before this attempt 'n' would have reached the maximum allowed delay value governed by the configured maxExponentialRetryDelayInMilliseconds.
In core-http the retries continue by capping max delay at the configured value and converts into a linear retry until either targeted retry attempt count is reached or abort signal is determined.
Overall, based on offline discussion and decisions, the computation of exponential retry delays is to mirror what we have in core-http and we will be going with introducing this new type of retry policy using a boolean flag on the RetryConfig as opposed to defining a new interface/type to RetryConfig.

…to issue-3331-p2

ramya0820 · 2019-07-05T17:43:02Z

I'm looking to add some tests soon to this PR if we can confirm that general assumptions and approach seem okay
@bterlson @ramya-rao-a

ramya-rao-a · 2019-07-05T19:24:49Z

sdk/core/core-amqp/src/retry.ts

+      const targetDelayInMs = Math.min(
+        config.minExponentialRetryDelayInMs + Math.pow(2, i),
+        config.maxExponentialRetryDelayInMs
+      );


With default values for min and max delays here, this results in a progression of 3, 5, 9, 17, 33, 65, 129, 257, 513, 1025, 2049 etc.

Questions:

Shouldn't we be starting from the min value instead of min value + 2 ?

This method differs from both what is in core-http and in storage. Which of the two did we take as reference here?

In the exponential retry policy in storage, Math.pow(2, i) is multiplied by the delay provided by the user instead of adding to it.

In the exponential retry policy in core-http, some other convoluted way is used

Shouldn't we be starting from the min value instead of min value + 2 ?

This mirrors what we have in core-http and would depend on the configured value - did we want to set the default min to 0 ?

This method differs from both what is in core-http and in storage. Which of the two did we take as reference here?

We had decided to use what we have in core-http as captured in #3331 (comment) as well, do we have consensus to go with one in storage instead? @bterlson @daviwil @sadasant

Overall, if we were to investigate best approach to take forward and not mirror core-http as decided, then following findings may help -

Based on discussion from cloud computing communities online,

Slowing clients down may help, and the classic way to slow clients down is capped exponential backoff. Capped exponential backoff means that clients multiply their backoff by a constant after each attempt, up to some maximum value. In our case, after each unsuccessful attempt, clients sleep for:

This seems to mirror what we have in storage. We'll likely need to update core-http as well.
And also decide if we want to do away with concept of minimum delay and use a maximum cap value along with retry attempts instead.

This would address, simplify other comment too where instead of delayInSeconds and min/max expoenential retry specific config, we can have one maxRetryDelay and maxRetries that would apply to both types of retry.
Additionally the number 2 can be made configurable as well, calling it the backoffFactor. (This seems to be a thing as well if you search for exponential time delay calculator /"online-backoff-calculator" online)

From #3331 (comment)

When an operation fails, the delay is computed as (2^retry-attempts) milliseconds and is thus incremented with each failed attempt in order of (1 ms, 3ms, 7ms, so on..) This is same as what we have in core-http

This is not what core-http is doing.

In core-http, (2^retry-attempts) is not used as the time itself. This value is used to multiply with the delay (after introducing a randomization factor) provided by the user. Please review https://github.com/Azure/azure-sdk-for-js/blob/%40azure/core-http_1.0.0-preview.1/sdk/core/core-http/lib/policies/exponentialRetryPolicy.ts#L131-L137 one more time.

Ah, yes missed the * in line 135.
This makes it overall similar to what we have in storage except for the use of minimum retry delay and boundedRandDelta computation.

@ramya-rao-a @bterlson
I've updated the implementation.

Moreover, @sadasant and I just discussed offline about having one copy of retry policy implementation where possible.

Should we house the exponential delay computation in one place and export, use that in here?
Would be good if we can clarify on which computation to use, so we can consider just having it modularized and exported, say from core-http.

Actually, currently it isn't making sense to take on core-http as dependency. But eventually we could have a separate module for retry I think.

Update: For clarity now made sure it's mirroring core-http much closely - @ramya-rao-a. Please check updated PR

Latest update looks much better and closer to what core-http is doing.

Sharing the retry code between core-http and core-amqp is not trivial at the moment. core-http operates on the model of a "pipeline". The pipeline consists of "policies". Each policy does what it needs to do and then passes the "request" to the next policy in the pipeline.

core-amqp does not have this concept of "pipeline". There are talks about making it have a pipeline, but that is not going to happen any time soon.

So, I am all in favor of aligning the retry logic between core-http and core-amqp, but not sharing actual re-usable code yet.

ramya-rao-a · 2019-07-05T19:25:47Z

sdk/core/core-amqp/src/retry.ts

  /**
   * @property {number} [delayInSeconds] Amount of time to wait in seconds before making the
-   * next attempt. Default: 15.
+   * next attempt. Applicable only when performing linear retry. Default: 15.


Can we not use this instead of introducing minExponentialRetryDelayInMs?

What would be the benefits?
One is intended for with linear retry where min/max don't lend any meaning.
For exponential retry, we have notion of min and max, so that would likely be a con as it may be confusing to mix these into one. Unless maybe the naming is clear enough?
@bterlson thoughts on usage/name?

Ah, never mind. I was looking at the storage implementation where there is no "min" value and the delay for the first time is the same as delayInSeconds and was thinking that maybe we don't need to configure the minimum value.

I see that we do let user configure the min value, and that is fine by me

bterlson

Approved modulo @ramya-rao-a's concerns.

ramya-rao-a · 2019-07-06T00:58:03Z

sdk/core/core-amqp/src/retry.ts

  /**
   * @property {string} connectionHost The host "<yournamespace>.servicebus.windows.net".
-   * Used to check network connectivity.
+   * Used to check network connectivity. Applicable only when performing linear retry.


connectionHost gets used regardless of linear or exponential retry.

ramya-rao-a · 2019-07-06T01:00:28Z

sdk/core/core-amqp/src/retry.ts

+   * @property {boolean} [exponentialRetry] Flag to denote if we want to perform exponential retry and not
+   * the default, which is linear.
+   */
+  exponentialRetry?: boolean;


I have generally found it helpful to name properties/variables/arguments that are boolean to have a preceding verb like is or has. In this case, I would suggest isExponential.

@bterlson Since the RetryOptions passed when creating a EventHubClient would need this property too and so this is exposed to our users, any thoughts on the name?

Agree, updated to isExponentialRetry for now

ramya-rao-a · 2019-07-06T01:07:12Z

sdk/core/core-amqp/src/retry.ts

        i,
        err
      );
+      let targetDelayInMs: number;


We can initialize targetDelayInMs with config.delayInSeconds to avoid the else block

ramya-rao-a · 2019-07-06T01:10:15Z

sdk/core/core-amqp/src/util/constants.ts

 export const defaultDelayBetweenOperationRetriesInSeconds = 5;
 export const defaultDelayBetweenRetriesInSeconds = 15;
+export const defaultMaxDelayForExponentialRetryInMs = 50000;
+export const defaultMinDelayForExponentialRetryInMs = 1;


How did we arrive at these defaults? Assuming that we wan't to mirror the retry policy in core-http, these should be different

Oh, the defaults seemed different in general across these 2, such as for delay between retries to be same as well (5 vs 15 vs 30 seconds based on type of retry).
Fixed this as well to use default as 30 sec.

ramya-rao-a · 2019-07-06T01:14:03Z

sdk/core/core-amqp/src/retry.ts

+
+        targetDelayInMs = Math.min(
+          config.minExponentialRetryDelayInMs + incrementDelta,
+          config.delayInSeconds


We should be choosing between config.minExponentialRetryDelayInMs + incrementDelta and maxExponentialRetryDelayInMs not config.delayInSeconds.

ah, yes - fixed

ramya-rao-a · 2019-07-06T01:17:48Z

sdk/core/core-amqp/src/retry.ts

+ * If `exponentialRetry` option is set, then the delay between retries is adjusted to increase
+ * exponentially with each attempt using back-off factor of power 2.
+ *
+ * @param {RetryConfig<T>} config Parameters that define what type of retry will be performed


The parameters define more than just the type of retry. The previous statement was good enough i.e Parameters to configure retry operation

ramya-rao-a · 2019-07-06T01:22:13Z

sdk/core/core-amqp/src/retry.ts

      );
+      let targetDelayInMs: number;
+      if (config.exponentialRetry) {
+        let incrementDelta = Math.pow(2, i) - 1;


At a quick glance, the words minExponentialRetryDelayInMs and maxExponentialRetryDelayInMs tell me that when using exponential retry policy, the delay between retries starts from the minExponentialRetryDelayInMs and is incremented exponentially till maxExponentialRetryDelayInMs

This understanding holds true only if incrementDelta is initialized to Math.pow(2, i - 1) - 1 instead of Math.pow(2, i) - 1

But, I am not sure if this what we want it to mean :)
i.e I am not sure if the first delay interval should be minExponentialRetryDelayInMs or minExponentialRetryDelayInMs + (a random value between 0 and 1 * delayInSeconds)

Resolution to this is not a blocker for merging this PR, but is a blocker in closing the linked issue

@bterlson ?

I agree with your intuition here - the first time we retry, incrementDelta should be zero, and since we start on "try 1", Math.pow(2, i - 1) - 1 makes sense to me.

In that case minExponentialRetryDelayInMs would need a better default ... Logged #4201 to discuss this

ramya-rao-a · 2019-07-08T19:36:19Z

sdk/core/core-amqp/src/retry.ts

  /**
   * @property {number} [delayInSeconds] Amount of time to wait in seconds before making the
   * next attempt. Default: 15.
+   * When `exponentialRetry` is set to `true`, this is used to compute the exponentially increasing delays between retries.


exponentialRetry -> isExponentialRetry

Same in a few other places where the old name is being used in comments

Addressed, re-addressed :)

ramya-rao-a · 2019-07-08T19:37:05Z

sdk/core/core-amqp/src/retry.ts

+   * @property {boolean} [exponentialRetry] Flag to denote if we want to perform exponential retry and not
+   * the default, which is linear.
+   */
+  isExponentialRetry?: boolean;


@bterlson Do you foresee any other kind of retry we would be supporting? If so, then we should probably have this as an enum

Probably, given that there are 3 or more supported retry schemes in core-http (linear, exponential, logarithmic, also see Fibonacci and a few others looking around outside Azure).

How can a single interface RetryOptions cater to the different kinds of inputs required for the different retry mechanisms?
In case of libraries using the http pipeline, each retry mechanism is a different "policy", so they won't have this problem.
Logged #4199 to continue this discussion

Updated to use enum as agreed

How can a single interface RetryOptions cater to the different kinds of inputs required for the different retry mechanisms?

Agree I recall touching upon this point during initial discussion and had captured it in the assumptions as well

ramya-rao-a · 2019-07-08T19:38:20Z

sdk/core/core-amqp/src/retry.ts

      );
+      let targetDelayInMs: number;
+      if (config.exponentialRetry) {
+        let incrementDelta = Math.pow(2, i) - 1;


@bterlson ?

bterlson · 2019-07-08T19:56:02Z

sdk/core/core-amqp/src/retry.ts

+        let incrementDelta = Math.pow(2, i) - 1;
+        const boundedRandDelta =
+          config.delayInSeconds * 0.8 +
+          Math.floor(Math.random() * (config.delayInSeconds * 1.2 - config.delayInSeconds * 0.8));


Isn't (config.delayInSeconds * 1.2 - config.delayInSeconds * 0.8) the same as configDelayInSeconds * 0.4?

Right, maybe the magic numbers/factors have a certain meaning? For instance we have time in milliseconds (say for 30 sec) being expressed as 1000 * 30 in some places for readability.

bterlson · 2019-07-08T20:01:09Z

sdk/core/core-amqp/src/retry.ts

      );
+      let targetDelayInMs: number;
+      if (config.exponentialRetry) {
+        let incrementDelta = Math.pow(2, i) - 1;


I agree with your intuition here - the first time we retry, incrementDelta should be zero, and since we start on "try 1", Math.pow(2, i - 1) - 1 makes sense to me.

ramya-rao-a

@ramya0820 The changes look good. Can we also update the tests in retry.spec.ts to get coverage for the exponential retry? I believe all the tests in there would be applicable for the exponential case.

Refer to the use forEach in errros.spec.ts where the same test is run for different inputs. We can use this model to update the current tests to run for both retry types

ramya0820 · 2019-07-09T23:39:23Z

Can we also update the tests in retry.spec.ts to get coverage for the exponential retry? I believe all the tests in there would be applicable for the exponential case.

Refer to the use forEach in errros.spec.ts where the same test is run for different inputs. We can use this model to update the current tests to run for both retry types

@ramya-rao-a
I've updated the tests to be extended to exponential retry.
Overall, other things we could improve maybe are

Revisit current set of tests to see if each testcase is closely modeling the intended description
Using single retryConfig for multiple types of retry does seem sketchy as discussed initially, since we have [core-amqp] Should retry mechanisms other than linear and exponential be supported? #4199 perhaps this can be improved after that concludes.
Additional tests specific to exponential retry for testing the delay computation could be added, but maybe this is overkill, result in flakiness, and, since we have [core-http] Delay for first retry in exponential retry policy should be after the configured min value #4201 open, this can be added after that concludes.

ramya-rao-a · 2019-07-09T23:54:55Z

sdk/core/core-amqp/test/retry.spec.ts

+[RetryPolicy.ExponentialRetryPolicy, RetryPolicy.LinearRetryPolicy].forEach(
+  () =>
+    function(retryPolicy) {
+      describe(`retry function for ${retryPolicy}`, function() {


Wont this print 0 and 1 because retry policy is an enum? That wouldnt help much. The words "fixed"/"linear" and "exponential" are preferable.
Would suggest in having a constant with the required value and then use it in the describe title

ramya-rao-a · 2019-07-10T00:18:01Z

sdk/core/core-amqp/test/retry.spec.ts

+              },
+              connectionId: "connection-1",
+              operationType: RetryOperationType.cbsAuth,
+              delayInSeconds: retryPolicy === RetryPolicy.ExponentialRetryPolicy ? 0 : 15,


Why 0 in case of exponential? That will result in no exponential increase at all right?
I would imagine that we don't have to make any changes to the delayInSeconds property in any of the tests

Implement exponential retry mechanism

926be21

ramya0820 self-assigned this Jul 3, 2019

ramya0820 changed the title ~~Implement exponential retry mechanism~~ [amqp-common] Implement exponential retry mechanism Jul 3, 2019

ramya0820 requested review from ShivangiReja, bterlson, daviwil, ramya-rao-a and sadasant July 3, 2019 20:41

Add parameter validation

b49cbd3

ramya0820 mentioned this pull request Jul 3, 2019

[core-amqp] Support exponential retry policy #3331

Closed

Update suffix to -InMs

75c8caa

ramya0820 changed the title ~~[amqp-common] Implement exponential retry mechanism~~ [core-amqp] Implement exponential retry mechanism Jul 5, 2019

Merge branch 'master' of https://github.com/Azure/azure-sdk-for-js in…

bb95538

…to issue-3331-p2

ramya-rao-a suggested changes Jul 5, 2019

View reviewed changes

bterlson approved these changes Jul 5, 2019

View reviewed changes

ramya0820 added 2 commits July 5, 2019 13:14

Update approach to mimic storage

24ba04b

Remove unused references

e72d3a2

sadasant mentioned this pull request Jul 5, 2019

JS Test Guidelines Proposal #4087

Closed

ramya0820 added 2 commits July 5, 2019 14:37

Update to mirror policy in core-http

2c02e5c

Update docs

3540491

ramya-rao-a suggested changes Jul 6, 2019

View reviewed changes

Address comments

324a868

ramya-rao-a suggested changes Jul 8, 2019

View reviewed changes

bterlson reviewed Jul 8, 2019

View reviewed changes

ramya-rao-a mentioned this pull request Jul 8, 2019

[core-amqp] Should retry mechanisms other than linear and exponential be supported? #4199

Closed

ramya0820 added 2 commits July 8, 2019 13:51

Update boolean flag references

e65ea10

Update to use enum

033be9a

ramya-rao-a suggested changes Jul 9, 2019

View reviewed changes

Extend tests

493da07

Fix build error

aa0d384

ramya-rao-a suggested changes Jul 10, 2019

View reviewed changes

Adjust delays

8af8f8d

ramya-rao-a approved these changes Jul 11, 2019

View reviewed changes

ramya0820 merged commit 489fd56 into Azure:master Jul 11, 2019

mikeharder mentioned this pull request Jul 20, 2019

Increment client preview versions #4365

Merged

Conversation

ramya0820 commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramya0820 commented Jul 5, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya0820 Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya0820 Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya0820 Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya0820 Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya-rao-a Jul 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bterlson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya-rao-a Jul 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramya-rao-a Jul 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ramya0820 commented Jul 3, 2019 •

edited

Loading

ramya0820 Jul 5, 2019 •

edited

Loading

ramya0820 Jul 5, 2019 •

edited

Loading

ramya0820 Jul 5, 2019 •

edited

Loading

ramya0820 Jul 5, 2019 •

edited

Loading

ramya-rao-a Jul 6, 2019 •

edited

Loading

ramya-rao-a Jul 6, 2019 •

edited

Loading

ramya-rao-a Jul 6, 2019 •

edited

Loading

ramya-rao-a Jul 9, 2019 •

edited

Loading