-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ready: Connection Tests #598
Conversation
@alexmoore ITestClusterLifecycle.testManyCommandsOneConnection() fails on my laptop. Here is a log
|
@srgg Ahh, it assumes a bucket-type named "plain" is created / activated. |
@alexmoore that's true, but the weird moment for me is that I've got '.NoNodesAvailableException' whereas Riak returned:
|
@alexmoore I've created a small PR #608 to make Future Operation handling a bit more universal, but not sure about the result, they are a bit contradictory. Therefore please feel free to drop off my PR. Other than this my controversial PR there is a bit weird situation with error handling that I faced when I run ITestClusterLifecycle without creating corresponding bucket type. Instead of throwing an original error:
java client throws:
|
@srgg Thanks for pointing that out! I found the issue, I forgot to return the connection before setting the exception in exactly 1 spot, which caused the retry storm before the connection was available. I'll patch and look at your PR as well. |
👍 |
This gives a test + fix for a lurking race condition as described in #523.
On the Netty worker thread: After an operation's response read, we would set the response to the FutureOperation class (time A), and then return the connection after verifying it was done (time B).
When the response was set on the FutureOperation class, it would notice it was done and fire any listeners, which would signal to the main Thread to unblock and proceed. After being signaled at time A, the main thread could race ahead and try to grab the connection before it was returned to the pool by the Netty thread at time B.
Since the Netty thread hasn't returned the connection yet, the operation will retry 2 more times. If the timing is right (or bad), the retries can also happen and fail before the connection is returned. This results in the "NoNodesAvailable" exception.
I will look into a separate fix in the future to slow down retries, but this should fix the main issue.