Ready: Connection Tests #598

alexmoore · 2016-03-09T18:46:51Z

This gives a test + fix for a lurking race condition as described in #523.

On the Netty worker thread: After an operation's response read, we would set the response to the FutureOperation class (time A), and then return the connection after verifying it was done (time B).

When the response was set on the FutureOperation class, it would notice it was done and fire any listeners, which would signal to the main Thread to unblock and proceed. After being signaled at time A, the main thread could race ahead and try to grab the connection before it was returned to the pool by the Netty thread at time B.

Since the Netty thread hasn't returned the connection yet, the operation will retry 2 more times. If the timing is right (or bad), the retries can also happen and fail before the connection is returned. This results in the "NoNodesAvailable" exception.

I will look into a separate fix in the future to slow down retries, but this should fix the main issue.

srgg · 2016-04-04T10:52:04Z

@alexmoore ITestClusterLifecycle.testManyCommandsOneConnection() fails on my laptop. Here is a log

2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 2
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Attempting to acquire channel permit
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Operation not being executed Riaknode 127.0.0.1:10017; no connections available
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 1
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Attempting to acquire channel permit
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Channel id:300790552 returned to pool
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Operation not being executed Riaknode 127.0.0.1:10017; no connections available
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Released pool permit
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.FutureOperation - Setting Complete on future
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 0
2016-04-04 13:47:31 [main] DEBUG com.basho.riak.client.api.ITestClusterLifecycle - Exception occurred
java.util.concurrent.ExecutionException: com.basho.riak.client.core.NoNodesAvailableException
    at com.basho.riak.client.core.FutureOperation.get(FutureOperation.java:314)
    at com.basho.riak.client.api.commands.CoreFutureAdapter.get(CoreFutureAdapter.java:52)
    at com.basho.riak.client.api.ITestClusterLifecycle.createAndStoreObject(ITestClusterLifecycle.java:86)
    at com.basho.riak.client.api.ITestClusterLifecycle.testManyCommandsOneConnection(ITestClusterLifecycle.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
Caused by: com.basho.riak.client.core.NoNodesAvailableException
    at com.basho.riak.client.core.RiakCluster.retryOperation(RiakCluster.java:469)
    at com.basho.riak.client.core.RiakCluster.access$1000(RiakCluster.java:48)
    at com.basho.riak.client.core.RiakCluster$RetryTask.run(RiakCluster.java:554)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2016-04-04 13:47:31 [main] DEBUG com.basho.riak.client.api.ITestClusterLifecycle - Cluster state: RUNNING, iteration: 0
2016-04-04 13:47:31 [main] INFO com.basho.riak.client.core.RiakCluster - RiakCluster is shutting down.

alexmoore · 2016-04-04T14:12:09Z

@srgg Ahh, it assumes a bucket-type named "plain" is created / activated.

srgg · 2016-04-04T14:27:10Z

@alexmoore that's true, but the weird moment for me is that I've got '.NoNodesAvailableException' whereas Riak returned:

DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type

srgg · 2016-04-04T16:32:41Z

@alexmoore I've created a small PR #608 to make Future Operation handling a bit more universal, but not sure about the result, they are a bit contradictory. Therefore please feel free to drop off my PR.

Other than this my controversial PR there is a bit weird situation with error handling that I faced when I run ITestClusterLifecycle without creating corresponding bucket type. Instead of throwing an original error:

DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type

java client throws:

java.util.concurrent.ExecutionException: com.basho.riak.client.core.NoNodesAvailableException

alexmoore · 2016-04-05T20:52:27Z

@srgg Thanks for pointing that out! I found the issue, I forgot to return the connection before setting the exception in exactly 1 spot, which caused the retry storm before the connection was available. I'll patch and look at your PR as well.

srgg · 2016-04-07T04:33:52Z

👍

alexmoore added 6 commits March 8, 2016 13:54

Add simple cluster lifecycle test

ca80e9e

Add lifecycle test to test for simple deadlocks

d8d1c4a

Possible solution for channel permit deadlock

b920c1c

Let's not do work on the netty thread

88a66d8

Merge branch 'develop' into connection-tests

5b5a516

Cleanup setException cases

bb3bcb1

alexmoore changed the title ~~Not Ready: Connection Tests~~ Ready: Connection Tests Apr 1, 2016

Fix error case bug, add test

b8ee579

alexmoore merged commit dd4438d into develop Apr 11, 2016

alexmoore deleted the connection-tests branch May 24, 2016 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ready: Connection Tests #598

Ready: Connection Tests #598

alexmoore commented Mar 9, 2016

srgg commented Apr 4, 2016

alexmoore commented Apr 4, 2016

srgg commented Apr 4, 2016

srgg commented Apr 4, 2016

alexmoore commented Apr 5, 2016

srgg commented Apr 7, 2016

Ready: Connection Tests #598

Ready: Connection Tests #598

Conversation

alexmoore commented Mar 9, 2016

srgg commented Apr 4, 2016

alexmoore commented Apr 4, 2016

srgg commented Apr 4, 2016

srgg commented Apr 4, 2016

alexmoore commented Apr 5, 2016

srgg commented Apr 7, 2016