Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-retry logic for configurable list of error codes #94

Merged
merged 1 commit into from
Jan 23, 2018

Conversation

HowardWolosky
Copy link
Member

There is a set of known error codes where users can usually achieve success if they simply retry enough times:

  • 429 - The Submission API limits a Tenant to 20 requests per minute. Any requests exceeding
    that will receive this error code until that minute has expired.

  • 503 - An underlying service component was unavailable, and the Submission API suggests that
    you try again.

This change modifies the default execution behavior of StoreBroker to auto-retry up to 5 times with exponential back-off for these two error codes. The implementation adds two new global variables to permit users to further configure how this works:

$global:SBAutoRetryErrorCodes - an int list of status codes that we should retry for
$global:SBMaxAutoRetries - an int indicating the maximum number of times we should retry before failing.

We will only report back telemetry when the retry count is reached (and we finally error out), or when the result is finally successful. We are adding two new telemetry properties that will be reported for both (eventual) successful requests,, as well as for eventual exceptions:

  • NumRetries - to track the number of times we had to attempt a retry
  • RetryStatusCode - to track the last status code that was hit that caused us to need to do the retry.

Resolves #92

@@ -90,6 +90,16 @@ function Initialize-StoreIngestionApiGlobalVariables
{
$global:SBDefaultProxyEndpoint = $null
}

if (!(Get-Variable -Name SBAutoRetryErrorCodes -Scope Global -ValueOnly -ErrorAction Ignore))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the place for it, but our handling of global variables could easily be implemented generically, with a hashtable of variable names and default values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. That would work for most of them (immediate exception that I see is how $SBLogPath and $SBLoggingEnabled in Helpers.ps1 have some interaction during initialization). I've opened #95 to track this suggestion for future consideration.

{
$headers.Add("UseINT", "true")
$AccessToken = Get-AccessToken -NoStatus:$NoStatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for this being in the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AccessTokens have a limited lifetime. If we didn't get a new one each time, it's possible that it would be invalid during one of the retry attempts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you grab your comment and add it to the script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feedback. Will do.

}

# Since we have retry logic, we won't create a new stopwatch every time,
# we'll continue the existing one if it already exists...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd remove the 'if' because $stopwatch always exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That comment was from a previous iteration

There is a set of known error codes where users can usually achieve success if they simply retry enough times:

  * `429` - The Submission API limits a Tenant to 20 requests per minute.  Any requests exceeding
       that will receive this error code until that minute has expired.

  * `503` - An underlying service component was unavailable, and the Submission API suggests that
       you try again.

This change modifies the default execution behavior of StoreBroker to auto-retry up to **5** times with exponential back-off for these two error codes.  The implementation adds two new global variables to permit users to further configure how this works:

 **`$global:SBAutoRetryErrorCodes`** - an `int` list of status codes that we should retry for
**`$global:SBMaxAutoRetries`** - an `int` indicating the maximum number of times we should retry before failing.

We will only report back telemetry when the retry count is reached (and we finally error out), or when the result is finally successful.  We are adding two new telemetry properties that will be reported for both (eventual) successful requests,, as well as for eventual exceptions:
   * `NumRetries` - to track the number of times we had to attempt a retry
   * `RetryStatusCode` - to track the last status code that was hit that caused us to need to do the retry.

Resolves microsoft#92
@HowardWolosky HowardWolosky merged commit d13cc9d into microsoft:master Jan 23, 2018
@HowardWolosky HowardWolosky deleted the 429retry branch January 23, 2018 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants