Occasionally stale server selection #42

netshade · 2012-11-15T18:44:14Z

We've been testing failover and while, by and large, the server selection works as expected, occasionally we see failover not selecting the right master server even after a node manager has correctly broadcast the changes to our client.

Taking a look at the code, it seems like the conditional server selection here (where presence in the stack overrides master selection if :master_only is supplied):

https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L458

coupled with .pop usage on the thread local client stack here:

https://github.com/ryanlecompte/redis_failover/blob/master/lib/redis_failover/client.rb#L475

might be causing it. We replaced the .pop usage with .clear and emptied the stack entirely when the method was called, and we stopped seeing incorrect client selection. Does this seem right to you? Is that actually a problem?

The text was updated successfully, but these errors were encountered:

ryanlecompte · 2012-11-15T19:14:12Z

Hmm, looking at the code I don't see how this could be a problem. The basic idea is that we should always #pop as many times as #client_for is called. Is there a particular redis call that's causing this for you? Is it happening in a simple call like #get or #set or in more complicated/nested calls like in #multi?

netshade · 2012-11-15T19:41:28Z

Loops on #incr, that's the only command. The client is initialized with :master_only => true . Setup was three node managers, one zookeeper, three instances of redis. Test script looked like:

loop do
 expected += 1
 actual = $client.incr("key")
 if expected != actual
  abort
  end
end

The comparison stuff worked as expected, but in the course of running that script while killing random servers, the client logger would report it had received the new master, but would immediately thereafter attempt to connect to the old master and write, either causing connection errors or read-only errors.

( Ruby 1.9.3 )

arohter mentioned this issue Oct 15, 2013

Make sure free_client() method is called on connection retries. #57

Merged

arohter mentioned this issue Jan 8, 2014

Connection handling improvements. #66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasionally stale server selection #42

Occasionally stale server selection #42

netshade commented Nov 15, 2012

ryanlecompte commented Nov 15, 2012

netshade commented Nov 15, 2012

Occasionally stale server selection #42

Occasionally stale server selection #42

Comments

netshade commented Nov 15, 2012

ryanlecompte commented Nov 15, 2012

netshade commented Nov 15, 2012