You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a problem with the zk's locker. If there is a connection loss to the Zookeeper then the instance which didn't have the lock creates an additional ephemeral znode to Zookeeper preventing the instance from ever getting the lock.
zk version 1.7.3, ruby version 1.9.3-p0
Here's some debug log from the instance that wasn't holding the lock:
At this point all is fine, in zookeeper there are lock znodes 0 and 1, then there's a very short network outage:
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : got result for path: /_zklocking/test_lock/ex0000000000, result: 1
DEBUG -- : wait_until_connected_or_dying @last_cnx_state: 1, time_left? false, @client_state: :running
Zookeeper::Exceptions::NotConnected is risen. Locker instance's #lock_path is correct at this point showing: "/_zklocking/test_lock/ex0000000001". Network comes back up.
Hi there,
There seems to be a problem with the zk's locker. If there is a connection loss to the Zookeeper then the instance which didn't have the lock creates an additional ephemeral znode to Zookeeper preventing the instance from ever getting the lock.
zk version 1.7.3, ruby version 1.9.3-p0
Here's some debug log from the instance that wasn't holding the lock:
DEBUG -- : got lock path /_zklocking/test_lock/ex0000000001
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000000"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000000
At this point all is fine, in zookeeper there are lock znodes 0 and 1, then there's a very short network outage:
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : got result for path: /_zklocking/test_lock/ex0000000000, result: 1
DEBUG -- : wait_until_connected_or_dying @last_cnx_state: 1, time_left? false, @client_state: :running
Zookeeper::Exceptions::NotConnected is risen. Locker instance's #lock_path is correct at this point showing: "/_zklocking/test_lock/ex0000000001". Network comes back up.
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>
Locker instance's #lock_path still correctly shows "/_zklocking/test_lock/ex0000000001"
DEBUG -- : got lock path /_zklocking/test_lock/ex0000000002
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000001"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000001
Ouch, this isn't right, something has created a new znode and now we're blocking on our own previous one.
This problem doesn't happen on the instance that was holding the lock.
Here's a test script to reproduce: http://codepad.org/eowuDVlH
Run two instances, then restart zookeeper server or otherwise block access for a couple of seconds
The text was updated successfully, but these errors were encountered: