Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ready: Connection Tests #598

Merged
merged 7 commits into from
Apr 11, 2016
Merged

Ready: Connection Tests #598

merged 7 commits into from
Apr 11, 2016

Conversation

alexmoore
Copy link
Contributor

This gives a test + fix for a lurking race condition as described in #523.

On the Netty worker thread: After an operation's response read, we would set the response to the FutureOperation class (time A), and then return the connection after verifying it was done (time B).

When the response was set on the FutureOperation class, it would notice it was done and fire any listeners, which would signal to the main Thread to unblock and proceed. After being signaled at time A, the main thread could race ahead and try to grab the connection before it was returned to the pool by the Netty thread at time B.

Since the Netty thread hasn't returned the connection yet, the operation will retry 2 more times. If the timing is right (or bad), the retries can also happen and fail before the connection is returned. This results in the "NoNodesAvailable" exception.

I will look into a separate fix in the future to slow down retries, but this should fix the main issue.

@alexmoore alexmoore changed the title Not Ready: Connection Tests Ready: Connection Tests Apr 1, 2016
@srgg
Copy link
Contributor

srgg commented Apr 4, 2016

@alexmoore ITestClusterLifecycle.testManyCommandsOneConnection() fails on my laptop. Here is a log

2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 2
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Attempting to acquire channel permit
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Operation not being executed Riaknode 127.0.0.1:10017; no connections available
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 1
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Attempting to acquire channel permit
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Channel id:300790552 returned to pool
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakNode - Operation not being executed Riaknode 127.0.0.1:10017; no connections available
2016-04-04 13:47:12 [nioEventLoopGroup-2-1] DEBUG com.basho.riak.client.core.RiakNode - Released pool permit
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.FutureOperation - Setting Complete on future
2016-04-04 13:47:12 [pool-1-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - operation failed; remaining retries: 0
2016-04-04 13:47:31 [main] DEBUG com.basho.riak.client.api.ITestClusterLifecycle - Exception occurred
java.util.concurrent.ExecutionException: com.basho.riak.client.core.NoNodesAvailableException
    at com.basho.riak.client.core.FutureOperation.get(FutureOperation.java:314)
    at com.basho.riak.client.api.commands.CoreFutureAdapter.get(CoreFutureAdapter.java:52)
    at com.basho.riak.client.api.ITestClusterLifecycle.createAndStoreObject(ITestClusterLifecycle.java:86)
    at com.basho.riak.client.api.ITestClusterLifecycle.testManyCommandsOneConnection(ITestClusterLifecycle.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
Caused by: com.basho.riak.client.core.NoNodesAvailableException
    at com.basho.riak.client.core.RiakCluster.retryOperation(RiakCluster.java:469)
    at com.basho.riak.client.core.RiakCluster.access$1000(RiakCluster.java:48)
    at com.basho.riak.client.core.RiakCluster$RetryTask.run(RiakCluster.java:554)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2016-04-04 13:47:31 [main] DEBUG com.basho.riak.client.api.ITestClusterLifecycle - Cluster state: RUNNING, iteration: 0
2016-04-04 13:47:31 [main] INFO com.basho.riak.client.core.RiakCluster - RiakCluster is shutting down.

@alexmoore
Copy link
Contributor Author

@srgg Ahh, it assumes a bucket-type named "plain" is created / activated.

@srgg
Copy link
Contributor

srgg commented Apr 4, 2016

@alexmoore that's true, but the weird moment for me is that I've got '.NoNodesAvailableException' whereas Riak returned:

DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type

@srgg
Copy link
Contributor

srgg commented Apr 4, 2016

@alexmoore I've created a small PR #608 to make Future Operation handling a bit more universal, but not sure about the result, they are a bit contradictory. Therefore please feel free to drop off my PR.

Other than this my controversial PR there is a bit weird situation with error handling that I faced when I run ITestClusterLifecycle without creating corresponding bucket type. Instead of throwing an original error:

DEBUG com.basho.riak.client.core.RiakNode - Riak replied with error; 0:no_type

java client throws:

java.util.concurrent.ExecutionException: com.basho.riak.client.core.NoNodesAvailableException

@alexmoore
Copy link
Contributor Author

@srgg Thanks for pointing that out! I found the issue, I forgot to return the connection before setting the exception in exactly 1 spot, which caused the retry storm before the connection was available. I'll patch and look at your PR as well.

@srgg
Copy link
Contributor

srgg commented Apr 7, 2016

👍

@alexmoore alexmoore merged commit dd4438d into develop Apr 11, 2016
@alexmoore alexmoore deleted the connection-tests branch May 24, 2016 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants