Proposal: Real-world test cases #10

eric · 2012-04-22T10:14:58Z

I wanted to document some of the real-world test cases I've been envisioning for a test suite for this library.

The Setup

It seems like it would be pretty easy to setup a local environment to test some of this stuff:

3 zookeeper servers
2 redis servers
2 clients
2 node monitors

to give us a chance to kill or hang each component and make sure everything reacts appropriately.

Scenarios

Here is an incomplete list of tests that I think should be run against a real set of redis servers and clients.

Kill a redis server with SIGKILL (a kill -9) — ensure the failover happens immediately
Pause a redis server (causing a hang) with SIGSTOP — ensure the monitor process notices the hang and starts a failover
Kill the master monitor process with SIGKILL — ensure another monitor takes over
Pause the master monitor process with SIGSTOP and then kill redis with SIGKILL — How long does this take to failover?

Monitoring

While running these tests, it would be worthwhile for the redis clients to be constantly running SET commands against redis.

Tracking the average and max times for requests would be helpful in understanding how long failover really takes. Using my metriks library may be helpful in getting those statistics easily.

I envision the redis client processes having an at_exit defined that would output statistics like the number of keys set, the number of errors, and the average and max times per SET. We could easily compare the number of keys they thought they set with the number that the final master has, to see what sort of failures happened.

The text was updated successfully, but these errors were encountered:

ryanlecompte · 2012-04-22T10:21:21Z

Nice! Thanks for putting these testing scenarios together. I have been doing similar testing locally with a 5 node Redis cluster and 5 node ZK cluster. I also have 2 node managers. All of my testing has been with SIGKILL, however. I'd love to get your help on setting this up too. You have some great ideas here.

eric · 2012-04-22T10:23:31Z

Using SIGSTOP and SIGCONT is a great way to ensure that everything works properly with a hung process instead of just a killed one — both cases are important to handle, but the hung case can be harder.

ryanlecompte mentioned this issue Oct 8, 2012

Rework specs to work against a set of real Redis/ZooKeeper nodes as opposed to stubs #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Real-world test cases #10

Proposal: Real-world test cases #10

eric commented Apr 22, 2012

ryanlecompte commented Apr 22, 2012

eric commented Apr 22, 2012

Proposal: Real-world test cases #10

Proposal: Real-world test cases #10

Comments

eric commented Apr 22, 2012

The Setup

Scenarios

Monitoring

ryanlecompte commented Apr 22, 2012

eric commented Apr 22, 2012