Add missing locks in agent and service code #1570

aboch · 2016-11-22T07:39:22Z

Related to moby/moby#28697, moby/moby#28845 and moby/moby#28712,

Signed-off-by: Alessandro Boch [email protected]

mavenugo · 2016-11-22T16:30:21Z

@aboch could you also take care of other missing locks for moby/moby#28712 ?

I think we need a sweep of the swarm related changes in libnetwork and look for missing locks and other concurrency issues. cc @sanimej

aboch · 2016-11-22T19:33:23Z

@mavenugo Sure thing, I was not aware of that issue.

aboch · 2016-11-29T17:30:24Z

ping @mavenugo @sanimej This is ready for review

sanimej · 2016-11-29T20:24:00Z

networkdb/cluster.go

@@ -44,6 +44,8 @@ func (l *logWriter) Write(p []byte) (int, error) {

 // SetKey adds a new key to the key ring
 func (nDB *NetworkDB) SetKey(key []byte) {
+	nDB.Lock()
+	defer nDB.Unlock()
 	logrus.Debugf("Adding key %s", hex.EncodeToString(key)[0:5])


Debug logging should be outside of the lock. Same comment for SetPrimaryKey and RemoveKey as well.

I intentionally put it inside the lock so that it gets printed when this function effectively does the job

That is not the purpose of lock. After printing the debug if the lock can't be acquired the go routine is going to be stuck. If its a deadlock the stack decode will capture the state when the user dumps it.

sanimej · 2016-11-29T20:55:56Z

agent.go

@@ -136,6 +144,7 @@ func (c *controller) handleKeyChange(keys []*types.EncryptionKey) error {
 			}
 		}
 	}
+	c.Unlock()


Calling a.networkDB.SetKey(key.Key) inside the controller lock doesn't look correct; its not necessary. Also,SetKey takes nDB.Lock which can potentially lead to deadlocs.. We should save the new key and call a.networkDB.SetKey(key.Key) outside of the controller lock.

How can it deadlock ?

Two concurrent go routines, one takes c.Lock() and tries nDB.Lock(); second go routine takes nDB.Lock() and tries c.Lock(). We had such deadlocks before, between network and endpoint locks IIRC.

Understand, but the nDB object does not have any reference to the agent or controller. It's a leaf object. From what I see, networkDB methods are properly coded so that they are the only one to exercise the nDB lock over the nDB data.

I don't think the situation you are describing can happen.
But I can take care of moving the function call outside of the controller lock, if that makes it less confusing.

I took care of it. PTAL

yeah, networkdb is currently a separate package without any dependency on libnetwork core. But its safer to avoid nested locks unless its really required. In this case there is no need to call networkDB.SetKey under the controller lock.

And yes. I agree, being conservative when it comes to holding locks is preferred. Especially when we have locks around a function call (with closely related code), it is much better to avoid it.

sanimej · 2016-11-29T21:41:08Z

agent.go

@@ -339,11 +363,13 @@ func (c *controller) agentClose() {
 		return
 	}

+	agent.Lock()


Can we get the slice of agent.driverCancelFuncs under the lock and do the actual invocation outside of the lock ? Only the access to agent.driverCancelFuncs is racy here. But we are locking around the execution of those functions.

I will take care of it. Thanks.

I took care of it. PTAL

Signed-off-by: Alessandro Boch <[email protected]>

sanimej · 2016-11-29T22:03:34Z

Thanks @aboch. LGTM

mavenugo · 2016-11-29T22:11:24Z

agent.go

@@ -127,7 +136,7 @@ func (c *controller) handleKeyChange(keys []*types.EncryptionKey) error {
 		if !same {
 			c.keys = append(c.keys, key)
 			if key.Subsystem == subsysGossip {
-				a.networkDB.SetKey(key.Key)
+				added = key.Key


am not too confident on this code-path. So consider this as a question... It looks like SetKey can be potentially called multiple times based on how many unique keys of type "subsysGossip" is present ?
With this change, it seems to be assumed that it will be only once and it is cached in added variable which is later used to set Key again. Is that the expectation ?

Yeah, it follows the existing logic for the deleted key (look for the deleted variable).
This is the code path for the key rotation, where we know there will only be one deleted and one added key. But @sanimej can correct us here, he initially wrote this logic.

@mavenugo What aboch said is correct.The key rotation logic adds one new key to the set and removes one. So before this change also only one key would change on a rotation.

mavenugo · 2016-11-29T22:34:08Z

network.go

-		return n.ctrlr.agent.networkDB.Peers(n.id)
-	}
-	return []networkdb.PeerInfo{}
+	return agent.networkDB.Peers(n.ID())


Is it safe to assume that networkDB will never be nil ?
The current code might behave this way. But why should we have such assumptions built in (which might cause panics in the future) ? I think it is safe to add the nil check.

Yes it is safe, because the networkDB field is instantiated at agent creation, and never reset.
This is why there is no check over agent.networkDB == nil throughout the agent code.

mavenugo · 2016-11-30T00:06:37Z

LGTM

bklau · 2016-12-16T03:56:35Z

@mavenugo : Q: Does this locking changes effective on global/swarm scope or just local scope?
Thx

mavenugo · 2016-12-16T04:54:20Z

@bklau it should be effective mostly for global/swarm scoped networks.

bklau · 2016-12-16T15:55:24Z

@mavenugo : Is my understanding correct that if I have a globally-scoped overlay network based on Consul, say, (not 1.12 Swarm mode); then if I issued two network overlay to create/delete the SAME network "my_net" concurrently, a global lock would be contested and held first before doing any create/delete for "my_net"?

aboch changed the title ~~Add missing locks in service code~~ Add missing locks in agent and service code Nov 22, 2016

aboch added the status/2-needs-code-review label Nov 24, 2016

aboch mentioned this pull request Nov 28, 2016

Cherry picks for the 1.12.x branch #1573

Merged

sanimej reviewed Nov 29, 2016

View reviewed changes

Add missing locks in agent and service code

8dcf996

Signed-off-by: Alessandro Boch <[email protected]>

sanimej removed the status/2-needs-code-review label Nov 29, 2016

mavenugo reviewed Nov 29, 2016

View reviewed changes

mavenugo merged commit 6cfa15e into moby:master Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing locks in agent and service code #1570

Add missing locks in agent and service code #1570

aboch commented Nov 22, 2016 •

edited

Loading

mavenugo commented Nov 22, 2016

aboch commented Nov 22, 2016

aboch commented Nov 29, 2016

sanimej Nov 29, 2016

aboch Nov 29, 2016

sanimej Nov 29, 2016

sanimej Nov 29, 2016

aboch Nov 29, 2016

sanimej Nov 29, 2016

aboch Nov 29, 2016

aboch Nov 29, 2016

sanimej Nov 29, 2016

mavenugo Nov 29, 2016

sanimej Nov 29, 2016 •

edited

Loading

aboch Nov 29, 2016

aboch Nov 29, 2016

sanimej commented Nov 29, 2016

mavenugo Nov 29, 2016

aboch Nov 29, 2016

sanimej Nov 30, 2016

mavenugo Nov 29, 2016

aboch Nov 29, 2016 •

edited

Loading

mavenugo commented Nov 30, 2016

bklau commented Dec 16, 2016

mavenugo commented Dec 16, 2016

bklau commented Dec 16, 2016 •

edited

Loading

Add missing locks in agent and service code #1570

Add missing locks in agent and service code #1570

Conversation

aboch commented Nov 22, 2016 • edited Loading

mavenugo commented Nov 22, 2016

aboch commented Nov 22, 2016

aboch commented Nov 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanimej Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanimej commented Nov 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboch Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

mavenugo commented Nov 30, 2016

bklau commented Dec 16, 2016

mavenugo commented Dec 16, 2016

bklau commented Dec 16, 2016 • edited Loading

aboch commented Nov 22, 2016 •

edited

Loading

sanimej Nov 29, 2016 •

edited

Loading

aboch Nov 29, 2016 •

edited

Loading

bklau commented Dec 16, 2016 •

edited

Loading