Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking seems not to be working #54

Closed
tsilen opened this issue Nov 9, 2012 · 2 comments
Closed

Locking seems not to be working #54

tsilen opened this issue Nov 9, 2012 · 2 comments

Comments

@tsilen
Copy link
Contributor

tsilen commented Nov 9, 2012

Hi there,

There seems to be a problem with the zk's locker. If there is a connection loss to the Zookeeper then the instance which didn't have the lock creates an additional ephemeral znode to Zookeeper preventing the instance from ever getting the lock.

zk version 1.7.3, ruby version 1.9.3-p0

Here's some debug log from the instance that wasn't holding the lock:

DEBUG -- : got lock path /_zklocking/test_lock/ex0000000001
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000000"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000000

At this point all is fine, in zookeeper there are lock znodes 0 and 1, then there's a very short network outage:

DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=1>
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : called #ZK::EventHandlerSubscription::Base:...] on threadpool
DEBUG -- : got result for path: /_zklocking/test_lock/ex0000000000, result: 1
DEBUG -- : wait_until_connected_or_dying @last_cnx_state: 1, time_left? false, @client_state: :running

Zookeeper::Exceptions::NotConnected is risen. Locker instance's #lock_path is correct at this point showing: "/_zklocking/test_lock/ex0000000001". Network comes back up.

DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>
DEBUG -- : EventHandler#process dispatching event: #<Zookeeper::Callbacks::WatcherCallback:... state=3>

Locker instance's #lock_path still correctly shows "/_zklocking/test_lock/ex0000000001"

DEBUG -- : got lock path /_zklocking/test_lock/ex0000000002
DEBUG -- : ZK::Locker::ExclusiveLocker#block_until_write_lock! path="/_zklocking/test_lock/ex0000000001"
DEBUG -- : assigning the @node_deletion_watcher
DEBUG -- : broadcasting
DEBUG -- : calling block_until_deleted
DEBUG -- : ok, going to block: /_zklocking/test_lock/ex0000000001

Ouch, this isn't right, something has created a new znode and now we're blocking on our own previous one.

This problem doesn't happen on the instance that was holding the lock.

Here's a test script to reproduce: http://codepad.org/eowuDVlH

Run two instances, then restart zookeeper server or otherwise block access for a couple of seconds

@tsilen
Copy link
Contributor Author

tsilen commented Nov 9, 2012

Same thing happens if you use locker's own with_lock method

@slyphon
Copy link
Contributor

slyphon commented Nov 10, 2012

closed by 5bb2561

released in 1.7.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants