- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.9k
close the jedis pool of the fail node #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…be connected or null
| @marcosnils why All checks have failed?oracle jdk7 has error? | 
| @liujg PR seems better now. Tests sometimes fail because of timeout issues. I've re run them so everything looks green now. I'll look it up tomorrow and hopefully merge | 
| @marcosnils great,please check this block bug i found,think you. | 
| @marcosnils ,are u check this blocking problem? | 
| @liujg replicating this scenario to test your patch takes some time. I'll try to save some bandwidth during the weekend for this. | 
| @marcosnils 2.8.1 milestone test pass? it takes a month.... | 
| @liujg sorry I couldn't merge this yet. I haven't had the time to setup the scenario to reproduce the bug and check your fix. Can you please enumerate the necessary steps to reproduce the bug? That'd allow me to move forward a lot faster to validate this issue. | 
| @marcosnils what i do is when discoverClusterSlots,please close the jedis pool of the fail node,so those blocking threads can release. | 
| @liujg Unfortunately, @antirez said output of 'cluster nodes' will be changed frequently. Since we can't rely on 'cluster nodes', unfortunately this approach is invalid. If issue still persists after #1178, we should try to find alternative way. | 
| @liujg @marcosnils Skimming the source code of GenericObjectPool, it seems that GenericObjectPool doesn't work properly when GenericObjectPool.create() - which calls PooledObjectFactory.makeObject() - throws Exception. Please refer GenericObjectPool from here. If we set  However, when we return object to GenericObjectPool which object can't go in idle object list, GenericObjectPool destroys object and calls ensureIdle(1, false) to ensure new object is created and added to idle objects when there's at least one of awaiting thread. The problem is, when GenericObjectPool.create() throws Exception, ensureIdle() cannot add new idle object to idle object list, which means blocked thread doesn't wake up. In other words, PooledObjectFactory.makeObject() should ensure new object is created without exception at any chance. (But JedisFactory.makeObject() does not.) | 
| @marcosnils 
 In result creating idle object never fails, but it may fail to activate from  | 
| Related issue is found : https://issues.apache.org/jira/browse/POOL-303 | 
| @marcosnils | 
| @HeartSaVioR seems like this is an non-blocking thing that doesn't have impact on the release. Can we move this to 2.8.2 so we can release 2.8.1? | 
| @marcosnils This issue seems major for me, so it would be better to resolve this ASAP, but I also agree that it is not blocking issue. We can move this to 2.8.2. | 
| @HeartSaVioR I agree that the issue is major, but non blocking. I'll move it to 2.8.2 so we can release 2.8.1 and 2.9.0 | 
| @HeartSaVioR | 
| @liujg @HeartSaVioR seems like https://issues.apache.org/jira/browse/POOL-303 is already fixed which should fix this problem. Is there a chance you can try with master version of commons-pool to validate it? | 
| @marcosnils I have tried to look at this problem and created the following unit test: package redis.clients.jedis.tests.cluster;
import org.junit.Test;
import redis.clients.jedis.HostAndPort;
import redis.clients.jedis.JedisCluster;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ClusterThreadsTest {
    @Test
    public void testMultiThreaded() throws InterruptedException {
        Set nodes = new HashSet();
        nodes.add(new HostAndPort("localhost", 30001));
        nodes.add(new HostAndPort("localhost", 30002));
        nodes.add(new HostAndPort("localhost", 30003));
        nodes.add(new HostAndPort("localhost", 30004));
        nodes.add(new HostAndPort("localhost", 30005));
        nodes.add(new HostAndPort("localhost", 30006));
        final JedisCluster cluster = new JedisCluster(nodes);
        cluster.set("a", "b");
        ExecutorService executorService = Executors.newFixedThreadPool(100);
        List<Callable<Object>> tasks = new ArrayList<Callable<Object>>(100);
        for (int i = 0; i< 200; i++) {
            Callable<Object> readTask = new Callable<Object>() {
                @Override
                public Object call() throws Exception {
                    System.out.println(cluster.get("a-" + Thread.currentThread().getName()) + "-" + Thread.currentThread().getName());
                    Thread.sleep(100);
                    cluster.set("a-" + Thread.currentThread().getName(), "b-" + Thread.currentThread().getName());
                    return null;
                }
            };
            tasks.add(readTask);
        }
        CountDownLatch stop = new CountDownLatch(1000);
        while (stop.getCount() > 0L) {
            executorService.invokeAll(tasks);
            stop.countDown();
        }
        cluster.close();
    }
}as a prerequisite a cluster should be started. The method I used was to execute: ./create-cluster create
./create-cluster startfrom redis-3.2.0/utils/create-cluster with default values. Than in order to reproduce the problem, kill processes (with kill -9) that are part of the cluster. 21321  0.2  0.1 141028  7764 ?        Ssl  13:35   0:00 ../../src/redis-server *:30001 [cluster]
21325  0.2  0.1 141028  7776 ?        Ssl  13:35   0:00 ../../src/redis-server *:30002 [cluster]
21327  0.2  0.1 141028  7780 ?        Ssl  13:35   0:00 ../../src/redis-server *:30003 [cluster]
21331  0.5  0.2 141028  9804 ?        Ssl  13:35   0:00 ../../src/redis-server *:30004 [cluster]
21337  0.7  0.2 141028  9812 ?        Ssl  13:35   0:00 ../../src/redis-server *:30005 [cluster]
21341  0.8  0.2 141028  9812 ?        Ssl  13:35   0:00 ../../src/redis-server *:30006 [cluster]Generally the first and the fourth process should allow the threads to block: Name: pool-1-thread-58
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@288ff99f
Total blocked: 5  Total waited: 1,394
Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)I have also checked out the latest commons-pool2 and built it (mvn clean install -DskipTests=true). When referencing it from the Jedis project the issue still seems to reproduce. | 
| @Anisotrop you're right, just tested with latest commons-pool version and seems like problem persists even though 303 has been included and threads are still parked at  In the mean time, a solution is to set  Here's a sample snippet to configure JedisCluster:         Set nodes = new HashSet();
        nodes.add(new HostAndPort("localhost", 30001));
        nodes.add(new HostAndPort("localhost", 30002));
        nodes.add(new HostAndPort("localhost", 30003));
        nodes.add(new HostAndPort("localhost", 30004));
        nodes.add(new HostAndPort("localhost", 30005));
        nodes.add(new HostAndPort("localhost", 30006));
        JedisPoolConfig config = new JedisPoolConfig();
        config.setMaxWaitMillis(1000); // You can tune this to your preference.
        final JedisCluster cluster = new JedisCluster(nodes, config);I'll keep investigating about the commons pool problem and let the guys know about the issue. It might be the same thing @HeartSaVioR mentioned before. | 
| @marcosnils In this case maybe a release with some of the fixes could be foreseen in the near future ? | 
| @Anisotrop I don't think this PR is actually necessary. Commons-pool should handle this situations natively. I'd like to investigate a bit further to see if there's something we can do at the commons-pool level to avoid Jedis unnecessary changes. | 
| @marcosnils Did you get the chance to also test with the following settings ?         JedisPoolConfig config = new JedisPoolConfig();
        config.setBlockWhenExhausted(false);When the thread will not be able to execute, an exception with: "Could not get a resource from the pool" will be thrown. When the cluster is rebalancing, an exception with message "Too many Cluster redirections?" seems to be thrown, but when the cluster comes back it seems the connections will work ok again. The list of nodes is not updated however (or so it seems) as the killed node still does not appear as closed. Note, please, that in the code sample I provided, the line:  for (int i = 0; i< 200; i++) {should be  for (int i = 0; i< 100; i++) { | 
| @marcosnils This issue bit us pretty badly: We're using Redis 3.2.5 in cluster mode, with Jedis 2.9.0 and Spring Data Redis 1.7.5. We killed one of our master nodes, and calls to Spring Data Redis'  I've been able to reproduce this and capture a stack: We seem to have 20 threads stuck in that state. I'll try setting  This feels like a pretty major issue to me: as far as I can tell the default behaviour is to hang forever if a master goes down. | 
| @ljrmorgan this is not a Jedis related issue but commons-pool stuff. I'll try to open them an issue to see if they can assist with this scenario | 
https://issues.apache.org/jira/browse/POOL-303 is released and should fix redis#1158"
https://issues.apache.org/jira/browse/POOL-303 is released and should fix #1158"
https://issues.apache.org/jira/browse/POOL-303 is released and should fix #1158"
https://issues.apache.org/jira/browse/POOL-303 is released and should fix #1158"
https://issues.apache.org/jira/browse/POOL-303 is released and should fix redis#1158"
There is a bug on the condition that:
redis cluster contains three masters and three slaves,if jedis client invoke JedisCluster with the default GenericObjectPoolConfig,this means the jedispool would block to wait idleObjects when exceed the default eight concurrences.At that time one master down and its slave replace it to master,but the blocking threads are waiting for the down master notify,but cannot wait for notify.so those threads will block util the down master live out the new master.
what i do is when discoverClusterSlots,please close the jedis pool of the fail node,so those blocking threads can release.