Skip to content

router: support customizable retry back-off intervals#6568

Merged
htuch merged 2 commits intoenvoyproxy:masterfrom
zuercher:zuercher_retry_policy_interval
Apr 18, 2019
Merged

router: support customizable retry back-off intervals#6568
htuch merged 2 commits intoenvoyproxy:masterfrom
zuercher:zuercher_retry_policy_interval

Conversation

@zuercher
Copy link
Copy Markdown
Member

Default behavior remains unchanged: retries will use the runtime parameter
defaulted to 25ms as the base interval and 250ms as the maximum. Allows
routes to customize the base and maximum intervals.

Risk Level: low (no change to default behavior)
Testing: unit tests
Doc Changes: included, plus updated description of back-off algorithm
Release Notes: added

Signed-off-by: Stephan Zuercher zuercher@gmail.com

@zuercher zuercher changed the title Default behavior remains unchanged: retries will use the runtime para… router: support customizable retry back-off intervals Apr 12, 2019
Default behavior remains unchanged: retries will use the runtime parameter
defaulted to 25ms as the base interval and 250ms as the maximum. Allows
routes to customize the base and maximum intervals.

Risk Level: low (no change to default behavior)
Testing: unit tests
Doc Changes: included, plus updated description of back-off algorithm
Release Notes: added

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
@zuercher zuercher force-pushed the zuercher_retry_policy_interval branch from f8b12d1 to e963e18 Compare April 12, 2019 20:03
snowp
snowp previously approved these changes Apr 15, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, just one question about the design

std::chrono::milliseconds base_interval(
runtime_.snapshot().getInteger("upstream.base_retry_backoff_ms", 25));
if (route_policy.baseInterval()) {
base_interval = *route_policy.baseInterval();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this means that it's not possible to override the interval with runtime anymore once it specified in the route config. Did you consider keeping the runtime value as an override for retries specified in the config?

Not sure which option is better, I can see both sides.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine adding a config point for a runtime key to use for overriding a route's back-off interval, but it doesn't feel right to me to have the runtime key override the interval for all the routes.

Given that there's no runtime key override for the per-try timeout, I think it makes sense to do it this way. I think it could even be argued that I should deprecate the runtime key.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo minor comments.
/wait

base_interval_ = std::chrono::milliseconds(
PROTOBUF_GET_MS_REQUIRED(retry_policy.retry_back_off(), base_interval));
if ((*base_interval_).count() < 1) {
base_interval_ = std::chrono::milliseconds(1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the floor function needed? Won't the backoff eventually (or quite quickly) raise this to a useful value?

Copy link
Copy Markdown
Member Author

@zuercher zuercher Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation takes an integer and assumes it's milliseconds. It also asserts that the value is >= 1. I think I could convert it to handle arbitrary std::chrono::duration types, but I wanted to limit the blast radius of this change.

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@htuch htuch merged commit 5ea103e into envoyproxy:master Apr 18, 2019
@zuercher zuercher deleted the zuercher_retry_policy_interval branch April 18, 2019 16:22
mpuncel added a commit to mpuncel/envoy that referenced this pull request Apr 19, 2019
* master: (26 commits)
  docs: update docs to recommend /retest repokitteh command (envoyproxy#6655)
  http timeout integration test: wait for 15s for upstream reset (envoyproxy#6646)
  access log: add response code details to the access log formatter (envoyproxy#6626)
  build: add ppc build badge to README (envoyproxy#6629)
  Revert dispatcher stats (envoyproxy#6649)
  Batch implementation with timer (envoyproxy#6452)
  fault filter: reset token bucket on data start (envoyproxy#6627)
  event: update libevent dependency to fix race condition (envoyproxy#6637)
  examples: standardize docker-compose version and yaml extension (envoyproxy#6613)
  quiche: Implement SpdyUnsafeArena using SpdySimpleArena (envoyproxy#6612)
  router: support customizable retry back-off intervals (envoyproxy#6568)
  api: create OpenRCA service proto file (envoyproxy#6497)
  ext_authz: option for clearing route cache of authorized requests (envoyproxy#6503)
  build: update jinja to 2.10.1. (envoyproxy#6623)
  tools: check spelling in pre-push hook (envoyproxy#6631)
  security: blameless postmortem template. (envoyproxy#6553)
  Implementing Endpoint lease for ClusterLoadAssigment (envoyproxy#6477)
  add HTTP integration tests exercising timeouts (envoyproxy#6621)
  event: fix DispatcherImplTest::InitializeStats flake (envoyproxy#6619)
  Add tag extractor for RDS route config name (envoyproxy#6618)
  ...

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants