-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Retry with backoff on cluster connection failures #2358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No behavior changes, just a refactoring. Changes: * Replaces recursion with a for loop * Extract redirection handling into its own method * Extract connection-failed handling into its own method Note that `tryWithRandomNode` is gone, it was never `true` so it and its code didn't survive the refactoring.
Inspired by redis#1334 where this went real easy :). Would have made redis#2355 shorter. Free public updates for JDK 7 ended in 2015: <https://en.wikipedia.org/wiki/Java_version_history> For JDK 8, free public support is available from non-Orace vendors until at least 2026 according to the same table. And JDK 8 is what Jedis is being tested on anyway: <https://github.com/redis/jedis/blob/ac0969315655180c09b8139c16bded09c068d498/.circleci/config.yml#L67-L74>
✅ 👀 Ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR breaks backward compatibility. Breaking backward compatibility means it won't be released until next major release. As of this moment, next major release for Jedis is 4.0.0 which, you can imagine, is a long away.
Try to find a backward compatible solution. Don't make the code too ugly for that purpose though :)
Thank you for your short turnaround time in reviewing, I really appreciate that @sazzad16! |
Another constructor is needed either way (I think). But if #2364 would get merged before this PR, that constructor could be made |
/** | ||
* Default timeout in milliseconds. | ||
*/ | ||
public static final int DEFAULT_TIMEOUT = 2000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public
makes these reachable from JedisClusterCommand.java
for its default timeout.
* consider connection exceptions and disregard random nodes * reset redirection
@sazzad16 Okay, If we have an improvement plan, then I agree to continue, but I still think the default value of
This is the responsibility of this PR, and maxTotalRetriesDuration should be added to the new command before merged. |
agreed
We can do this after the PR is approved. |
@gkorland @yangbodong22011 Please check #2490. Hopefully that PR addresses your concerns. |
Conflicts: src/main/java/redis/clients/jedis/BinaryJedisCluster.java src/main/java/redis/clients/jedis/JedisCluster.java
🥳 |
@walles we are seeing similar for jedis version:
|
Hi @nitinware Took a brief look at the history of commits and it shows that this PR is already part of 5.1.0. "No more cluster attempts left." is a pretty generic error thrown when there is a persisting error even after retries are exhausted. Looking at Jedis code the actual cause is stored as a suppressed exception inside JedisClusterOperationException here. I see you are using spring framework and it wraps the original Jedis exception, probably somewhere down the stack there should be JedisClusterOperationException with the actual error causing the failure inside suppressed. Hope it helps |
Before this change, if there were connection failures to the cluster, we did all our retries without any backoff.
With this change in place:
maxAttempts
(see theshouldBackOff()
method)getBackoffSleepMillis()
methodAdditionally, this change adds unit tests for the retries / backoff logic.
This change is based on the changes in #2355 (approved, not yet merged, currently waiting for more reviewers).