You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to document some of the real-world test cases I've been envisioning for a test suite for this library.
The Setup
It seems like it would be pretty easy to setup a local environment to test some of this stuff:
3 zookeeper servers
2 redis servers
2 clients
2 node monitors
to give us a chance to kill or hang each component and make sure everything reacts appropriately.
Scenarios
Here is an incomplete list of tests that I think should be run against a real set of redis servers and clients.
Kill a redis server with SIGKILL (a kill -9) — ensure the failover happens immediately
Pause a redis server (causing a hang) with SIGSTOP — ensure the monitor process notices the hang and starts a failover
Kill the master monitor process with SIGKILL — ensure another monitor takes over
Pause the master monitor process with SIGSTOP and then kill redis with SIGKILL — How long does this take to failover?
Monitoring
While running these tests, it would be worthwhile for the redis clients to be constantly running SET commands against redis.
Tracking the average and max times for requests would be helpful in understanding how long failover really takes. Using my metriks library may be helpful in getting those statistics easily.
I envision the redis client processes having an at_exit defined that would output statistics like the number of keys set, the number of errors, and the average and max times per SET. We could easily compare the number of keys they thought they set with the number that the final master has, to see what sort of failures happened.
The text was updated successfully, but these errors were encountered:
Nice! Thanks for putting these testing scenarios together. I have been doing similar testing locally with a 5 node Redis cluster and 5 node ZK cluster. I also have 2 node managers. All of my testing has been with SIGKILL, however. I'd love to get your help on setting this up too. You have some great ideas here.
Using SIGSTOP and SIGCONT is a great way to ensure that everything works properly with a hung process instead of just a killed one — both cases are important to handle, but the hung case can be harder.
I wanted to document some of the real-world test cases I've been envisioning for a test suite for this library.
The Setup
It seems like it would be pretty easy to setup a local environment to test some of this stuff:
to give us a chance to kill or hang each component and make sure everything reacts appropriately.
Scenarios
Here is an incomplete list of tests that I think should be run against a real set of redis servers and clients.
SIGKILL
(akill -9
) — ensure the failover happens immediatelySIGSTOP
— ensure the monitor process notices the hang and starts a failoverSIGKILL
— ensure another monitor takes overSIGSTOP
and then kill redis withSIGKILL
— How long does this take to failover?Monitoring
While running these tests, it would be worthwhile for the redis clients to be constantly running
SET
commands against redis.Tracking the average and max times for requests would be helpful in understanding how long failover really takes. Using my
metriks
library may be helpful in getting those statistics easily.I envision the redis client processes having an
at_exit
defined that would output statistics like the number of keys set, the number of errors, and the average and max times perSET
. We could easily compare the number of keys they thought they set with the number that the final master has, to see what sort of failures happened.The text was updated successfully, but these errors were encountered: