Locking seems not to be working #54

tsilen · 2012-11-09T22:16:56Z

Hi there,

There seems to be a problem with the zk's locker. If there is a connection loss to the Zookeeper then the instance which didn't have the lock creates an additional ephemeral znode to Zookeeper preventing the instance from ever getting the lock.

zk version 1.7.3, ruby version 1.9.3-p0

Here's some debug log from the instance that wasn't holding the lock:

DEBUG -- : got lock path /_zklocking/test_lock/ex0000000001
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000000"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000000

At this point all is fine, in zookeeper there are lock znodes 0 and 1, then there's a very short network outage:

DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : got result for path: /_zklocking/test_lock/ex0000000000, result: 1
DEBUG -- : wait_until_connected_or_dying @last_cnx_state: 1, time_left? false, @client_state: :running

Zookeeper::Exceptions::NotConnected is risen. Locker instance's #lock_path is correct at this point showing: "/_zklocking/test_lock/ex0000000001". Network comes back up.

DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>

Locker instance's #lock_path still correctly shows "/_zklocking/test_lock/ex0000000001"

DEBUG -- : got lock path /_zklocking/test_lock/ex0000000002
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000001"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000001

Ouch, this isn't right, something has created a new znode and now we're blocking on our own previous one.

This problem doesn't happen on the instance that was holding the lock.

Here's a test script to reproduce: http://codepad.org/eowuDVlH

Run two instances, then restart zookeeper server or otherwise block access for a couple of seconds

tsilen · 2012-11-09T23:41:04Z

Same thing happens if you use locker's own with_lock method

slyphon · 2012-11-10T01:13:52Z

closed by 5bb2561

released in 1.7.4

slyphon closed this as completed Nov 10, 2012

tsilen mentioned this issue Nov 13, 2012

One more locker issue #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locking seems not to be working #54

Locking seems not to be working #54

tsilen commented Nov 9, 2012

tsilen commented Nov 9, 2012

slyphon commented Nov 10, 2012

Locking seems not to be working #54

Locking seems not to be working #54

Comments

tsilen commented Nov 9, 2012

tsilen commented Nov 9, 2012

slyphon commented Nov 10, 2012