In worker, retry forever with an exponential back off when Redis interactions time out #22

nathansobo · 2019-11-07T01:01:59Z

Currently, when we time out talking to Redis, we reconnect and retry the operation. For a fail-over scenario where the Redis server has moved to a new host, this behavior works. For scenarios in which the Redis is still available but is overwhelmed with load, repeatedly reconnecting and retrying operations has the potential to make the situation worse.

In this PR, I introduce Worker#with_exponential_backoff and use it in the Worker instead of with_retries.

When retrying, exponentially back off by powers of 2, up to a maximum of 60 seconds, with 5 seconds of random jitter.
Continue retrying forever until the worker is explicitly shut down. This prevents a scenario where the worker process dies after N attempts only to be restarted by Resqued. This ensures that we continue to retry at a reduced frequency until Redis service health recovers. Restarting the process would cause us to start retrying at a faster rate.

I limit these changes to the worker because backing off and retrying forever in Unicorn processes when enqueuing jobs could cause request timeouts.

I also change the behavior of with_retries slightly so that attempts to reconnect also count as a retry attempt. The existing logic can end up trying to reconnect up to 9 times in certain scenarios.

This allows for multiple reconnect attempts before raising, and each one counts as an attempt.

dbussink · 2019-11-07T13:28:17Z

Sorry, I missed this PR when opening #23 and after @nronas approved it, I already merged it before this change.

Feel free to incorporate some of the further changes here though, #23 was aiming at the most minimal fix I could come up with.

Co-Authored-By: Nathan Witmer <[email protected]>

Nathan Sobo and others added 4 commits November 6, 2019 16:17

Honor retries parameter when interacting with Redis

ce89808

Only sleep if we actually want to reconnect

0bb79a3

Retry even if we fail to reconnect

0e2abe2

This allows for multiple reconnect attempts before raising, and each one counts as an attempt.

Rename retries parameter to be more explicit

fc772ae

Exponentially back-off when retrying Redis operations

e6cfc05

Co-Authored-By: Nathan Witmer <[email protected]>

nathansobo force-pushed the fix-retries branch from ee6c29a to e6cfc05 Compare November 7, 2019 15:08

Nathan Sobo added 4 commits November 7, 2019 08:37

Merge remote-tracking branch 'origin/github' into fix-retries

e15027e

Avoid delaying test with exponential back-off

754517e

Only in worker: enable infinite retries with exponential back-off

4df2639

Make variable name more precise

845cda9

nathansobo changed the title ~~Avoid infinite loop in retry logic when exceptions occur talking to Redis~~ In worker, retry forever with an exponential back off when Redis interactions time out Nov 7, 2019

nathansobo marked this pull request as ready for review November 7, 2019 16:56

Nathan Sobo added 4 commits November 7, 2019 11:12

Test shutdown during a retry/backoff loop

4e16da1

Fix comment

b8c5164

Make with_retries simpler again

d294c44

Use with_exponential_backoff explicitly in Worker

8416d86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In worker, retry forever with an exponential back off when Redis interactions time out #22

In worker, retry forever with an exponential back off when Redis interactions time out #22

nathansobo commented Nov 7, 2019 •

edited

Loading

dbussink commented Nov 7, 2019

In worker, retry forever with an exponential back off when Redis interactions time out #22

Are you sure you want to change the base?

In worker, retry forever with an exponential back off when Redis interactions time out #22

Conversation

nathansobo commented Nov 7, 2019 • edited Loading

dbussink commented Nov 7, 2019

nathansobo commented Nov 7, 2019 •

edited

Loading