Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasionally stale server selection #42

Open
netshade opened this issue Nov 15, 2012 · 2 comments
Open

Occasionally stale server selection #42

netshade opened this issue Nov 15, 2012 · 2 comments

Comments

@netshade
Copy link

We've been testing failover and while, by and large, the server selection works as expected, occasionally we see failover not selecting the right master server even after a node manager has correctly broadcast the changes to our client.

Taking a look at the code, it seems like the conditional server selection here (where presence in the stack overrides master selection if :master_only is supplied):

https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L458

coupled with .pop usage on the thread local client stack here:

https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L475

might be causing it. We replaced the .pop usage with .clear and emptied the stack entirely when the method was called, and we stopped seeing incorrect client selection. Does this seem right to you? Is that actually a problem?

@ryanlecompte
Copy link
Owner

Hmm, looking at the code I don't see how this could be a problem. The basic idea is that we should always #pop as many times as #client_for is called. Is there a particular redis call that's causing this for you? Is it happening in a simple call like #get or #set or in more complicated/nested calls like in #multi?

@netshade
Copy link
Author

Loops on #incr, that's the only command. The client is initialized with :master_only => true . Setup was three node managers, one zookeeper, three instances of redis. Test script looked like:

loop do
 expected += 1
 actual = $client.incr("key")
 if expected != actual
  abort
  end
end

The comparison stuff worked as expected, but in the course of running that script while killing random servers, the client logger would report it had received the new master, but would immediately thereafter attempt to connect to the old master and write, either causing connection errors or read-only errors.

( Ruby 1.9.3 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants