- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.9k
Description
Expected behavior
I have a Redis Cluster with 4 master nodes and 4 slave nodes. Each master has 1 slave. I use JedisCluster keep writing data to Redis Cluster. When I killed a redis master, the slave switch to master as expected, and I thought JedisCluster would reconnect to the new master.
Actual behavior
Actually I find JedisCluster throw JedisConnectionException.
If I want my program could tolerate 1 master node failure, I should catch the JedisConnectionException, try to create a new JedisCluster by calling new JedisCluster(nodes), and write data again. But this way isn't what I expected, or Jedis guides described.
I tried this test with Jedis 2.9.0 and 2.8.2, all happed as above. But when I run this test with Jedis 2.8.1, it works as I expected.
According to Jedis 2.9.0 sources, and code changes in Github, I thought the 131-141 lines of JedisClusterCommand.java may have a miss. As described in Issue #1252, the codes could be adding some more logic as below:
131      if (attempts == 1) {
132        //We need this because if node is not reachable anymore - we need to finally initiate slots renewing,
133        //or we can stuck with cluster state without one node in opposite case.
134        //But now if maxAttempts = 1 or 2 we will do it too often. For each time-outed request.
135        //TODO make tracking of successful/unsuccessful operations for node - do renewing only
136        //if there were no successful responses from this node last few seconds
137        this.connectionHandler.renewSlotCache();
+++      }
138
+++      if (attempts < 1) {
139        //no more redirections left, throw original exception, not JedisClusterMaxRedirectionsException, because it's not MOVED situation
140        throw jce;
141      }
or change logic as below:
131      if (attempts == 1) {
132        //We need this because if node is not reachable anymore - we need to finally initiate slots renewing,
133        //or we can stuck with cluster state without one node in opposite case.
134        //But now if maxAttempts = 1 or 2 we will do it too often. For each time-outed request.
135        //TODO make tracking of successful/unsuccessful operations for node - do renewing only
136        //if there were no successful responses from this node last few seconds
137        //this.connectionHandler.renewSlotCache();
+++      tryRandomNode = true;
+++      }
138
+++      if (attempts < 1) {
139        //no more redirections left, throw original exception, not JedisClusterMaxRedirectionsException, because it's not MOVED situation
140        throw jce;
141      }
Steps to reproduce:
- Running a Redis Cluster which contains master nodes and slave nodes.
- Keeping writing data to cluster by using JedisCluster.
- Killing one master, and the slave switching to master.
- The test throwing a JedisConnectionException.
Redis / Jedis Configuration
Redis run as a cluster with 4 master and 4 slave.
And using JedisCluster with default timeout and retry times.
Jedis version:
Jedis 2.9.0, 2.8.2
Redis version:
Redis 3.2.1
Java version:
Java 8u92