Skip to content

JedisCluster throws JedisConnectionException when a cluster master killed #1439

@skyhawk1981

Description

@skyhawk1981

Expected behavior

I have a Redis Cluster with 4 master nodes and 4 slave nodes. Each master has 1 slave. I use JedisCluster keep writing data to Redis Cluster. When I killed a redis master, the slave switch to master as expected, and I thought JedisCluster would reconnect to the new master.

Actual behavior

Actually I find JedisCluster throw JedisConnectionException.
If I want my program could tolerate 1 master node failure, I should catch the JedisConnectionException, try to create a new JedisCluster by calling new JedisCluster(nodes), and write data again. But this way isn't what I expected, or Jedis guides described.

I tried this test with Jedis 2.9.0 and 2.8.2, all happed as above. But when I run this test with Jedis 2.8.1, it works as I expected.

According to Jedis 2.9.0 sources, and code changes in Github, I thought the 131-141 lines of JedisClusterCommand.java may have a miss. As described in Issue #1252, the codes could be adding some more logic as below:

131 if (attempts == 1) {
132 //We need this because if node is not reachable anymore - we need to finally initiate slots renewing,
133 //or we can stuck with cluster state without one node in opposite case.
134 //But now if maxAttempts = 1 or 2 we will do it too often. For each time-outed request.
135 //TODO make tracking of successful/unsuccessful operations for node - do renewing only
136 //if there were no successful responses from this node last few seconds
137 this.connectionHandler.renewSlotCache();
+++ }
138
+++ if (attempts < 1) {
139 //no more redirections left, throw original exception, not JedisClusterMaxRedirectionsException, because it's not MOVED situation
140 throw jce;
141 }

or change logic as below:

131 if (attempts == 1) {
132 //We need this because if node is not reachable anymore - we need to finally initiate slots renewing,
133 //or we can stuck with cluster state without one node in opposite case.
134 //But now if maxAttempts = 1 or 2 we will do it too often. For each time-outed request.
135 //TODO make tracking of successful/unsuccessful operations for node - do renewing only
136 //if there were no successful responses from this node last few seconds
137 //this.connectionHandler.renewSlotCache();
+++ tryRandomNode = true;
+++ }
138
+++ if (attempts < 1) {
139 //no more redirections left, throw original exception, not JedisClusterMaxRedirectionsException, because it's not MOVED situation
140 throw jce;
141 }

Steps to reproduce:

  1. Running a Redis Cluster which contains master nodes and slave nodes.
  2. Keeping writing data to cluster by using JedisCluster.
  3. Killing one master, and the slave switching to master.
  4. The test throwing a JedisConnectionException.

Redis / Jedis Configuration

Redis run as a cluster with 4 master and 4 slave.
And using JedisCluster with default timeout and retry times.

Jedis version:

Jedis 2.9.0, 2.8.2

Redis version:

Redis 3.2.1

Java version:

Java 8u92

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions