- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.9k
Description
Expected behavior
When a cluster node that redis is writing to goes down, JedisCluster recovers by attempting to connect to another node in the cluster, in a user-agnostic manner. A few reconnection attempts occur, with a backoff strategy.
Actual behavior
When a cluster node that redis is writing to goes down, JedisCluster immediately retries the connection to the same node several times without backoff. After maxAttempts, it crashes with JedisClusterMaxAttemptsException: No more cluster attempts left..
Stack trace:
Exception in thread "main" redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more cluster attempts left.
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:86)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:25)
	at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:443)
	at JedisClusterFailover.main(JedisClusterFailover.java:23)
One observation: runWithRetries has no backoff, so all retries will occur immediately. Even if connection to another node was attempted, the cluster has no time to recover by electing a new master.
Steps to reproduce:
Repro: https://gist.github.com/jensgreen/259de5e06da6e4fab348f89d4e63fab9
- Start up redis cluster
- Start 6 redis nodes on ports 7000-7005.
- Create cluster: $ redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1.
 
- Run JedisClusterFailover main method from the link above.
- Terminate the redis master that is being written to with ^C.
- To find the node, kill masters one by one, until java program crashes. Bring killed masters back online if the program did not crash.
 
Redis / Jedis Configuration
Jedis configuration in Gist above.
Redis cluster on localhost with 3 masters and 3 replicas.
Cluster created with redis-cli command:
$ redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1
Contents redis.conf:
port $PORT
dir $DIR
appendonly yes
cluster-enabled yes
always-show-logo no
Jedis version:
3.2.0
Redis version:
6.0.9
Java version:
openjdk version "11.0.8" 2020-07-14 LTS
OpenJDK Runtime Environment Corretto-11.0.8.10.1 (build 11.0.8+10-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.8.10.1 (build 11.0.8+10-LTS, mixed mode)