Skip to content

JedisCluster keeps reconnecting to the same cluster node when it has gone down #2347

@jensgreen

Description

@jensgreen

Expected behavior

When a cluster node that redis is writing to goes down, JedisCluster recovers by attempting to connect to another node in the cluster, in a user-agnostic manner. A few reconnection attempts occur, with a backoff strategy.

Actual behavior

When a cluster node that redis is writing to goes down, JedisCluster immediately retries the connection to the same node several times without backoff. After maxAttempts, it crashes with JedisClusterMaxAttemptsException: No more cluster attempts left..

Stack trace:

Exception in thread "main" redis.clients.jedis.exceptions.JedisClusterMaxAttemptsException: No more cluster attempts left.
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:86)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:124)
	at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:25)
	at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:443)
	at JedisClusterFailover.main(JedisClusterFailover.java:23)

One observation: runWithRetries has no backoff, so all retries will occur immediately. Even if connection to another node was attempted, the cluster has no time to recover by electing a new master.

Steps to reproduce:

Repro: https://gist.github.com/jensgreen/259de5e06da6e4fab348f89d4e63fab9

  1. Start up redis cluster
    1. Start 6 redis nodes on ports 7000-7005.
    2. Create cluster: $ redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1.
  2. Run JedisClusterFailover main method from the link above.
  3. Terminate the redis master that is being written to with ^C.
    • To find the node, kill masters one by one, until java program crashes. Bring killed masters back online if the program did not crash.

Redis / Jedis Configuration

Jedis configuration in Gist above.

Redis cluster on localhost with 3 masters and 3 replicas.

Cluster created with redis-cli command:

$ redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Contents redis.conf:

port $PORT
dir $DIR
appendonly yes
cluster-enabled yes
always-show-logo no

Jedis version:

3.2.0

Redis version:

6.0.9

Java version:

openjdk version "11.0.8" 2020-07-14 LTS
OpenJDK Runtime Environment Corretto-11.0.8.10.1 (build 11.0.8+10-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.8.10.1 (build 11.0.8+10-LTS, mixed mode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions