-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: Fix panic when allocations happen at init time #1651
Conversation
I think we should also make sure this scenario is covered in unit tests. I'm not very familiar with the tests for this part of the code, so it would be great if someone could point me in the right direction or submit a separate PR for test coverage. |
LGTM |
@@ -81,6 +81,7 @@ func (a *Allocator) doNetworkInit(ctx context.Context) error { | |||
unallocatedNetworks: make(map[string]*api.Network), | |||
ingressNetwork: newIngressNetwork(), | |||
} | |||
a.netCtx = nc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason a.netCtx
is initialized at the end is to make sure in case of failures we don't return from doNetworkInit
with a netCtx
which shouldn't be there in the Allocator
. This is because the Allocator
has a longer life time than what happens in doNetworkInit
itself.
So in all of doNetworkInit
I've just passed the netCtx
as an argument to functions that need it. May be we should do the same for taskCreateNetworkAttachments
i.e add netCtx
as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds kind of dangerous TBH. What about a deferred closure in doNetworkInit
that clears a.netCtx
if an error is being returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think a deferred closure for error handling should satisfy that requirement as well. I am good with that.
Current coverage is 56.54% (diff: 69.04%)@@ master #1651 diff @@
==========================================
Files 90 90
Lines 14551 14552 +1
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 8248 8229 -19
- Misses 5214 5233 +19
- Partials 1089 1090 +1
|
a.netCtx is initialized too late, so if allocations happen as part of doNetworkInit, a nil pointer dereference will cause a panic. Initialize a.netCtx earlier and use a.netCtx directly in member functions instead of passing the network context separately, so there is no confusion about which to use. Also change allocator.go to have separate entries in the waitgroup for initialization and actually running the allocator, and defer `Done` for both. This should prevent a panic like this from leading to a deadlock, since the deferred `Done` will be reached. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
dab9090
to
a8066c1
Compare
@mrjana: Updated to add a closure that clears |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@aaronlehmann Adding a test case to cover this scenario would be nice. I will try to add one. |
@aaronlehmann @mrjana Please add a test case in a separate PR |
To identify the issue in allocator. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Also Cherry-pick moby/swarmkit#1651 to identify the issue in allocator. Signed-off-by: Madhu Venugopal <madhu@docker.com>
a.netCtx
is initialized too late, so if allocations happen as part ofdoNetworkInit
, a nil pointer dereference will cause a panic.Initialize
a.netCtx
earlier and usea.netCtx
directly in memberfunctions instead of passing the network context separately, so there is
no confusion about which to use.
Also change allocator.go to have separate entries in the waitgroup for
initialization and actually running the allocator, and defer
Done
forboth. This should prevent a panic like this from leading to a deadlock,
since the deferred
Done
will be reached.See moby/moby#25432
cc @mrjana @LK4D4 @tonistiigi