Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: Fix panic when allocations happen at init time #1651

Merged
merged 1 commit into from
Oct 18, 2016

Conversation

aaronlehmann
Copy link
Collaborator

@aaronlehmann aaronlehmann commented Oct 17, 2016

a.netCtx is initialized too late, so if allocations happen as part of
doNetworkInit, a nil pointer dereference will cause a panic.

Initialize a.netCtx earlier and use a.netCtx directly in member
functions instead of passing the network context separately, so there is
no confusion about which to use.

Also change allocator.go to have separate entries in the waitgroup for
initialization and actually running the allocator, and defer Done for
both. This should prevent a panic like this from leading to a deadlock,
since the deferred Done will be reached.

See moby/moby#25432

cc @mrjana @LK4D4 @tonistiigi

@aaronlehmann
Copy link
Collaborator Author

I think we should also make sure this scenario is covered in unit tests. I'm not very familiar with the tests for this part of the code, so it would be great if someone could point me in the right direction or submit a separate PR for test coverage.

@LK4D4
Copy link
Contributor

LK4D4 commented Oct 17, 2016

LGTM

@@ -81,6 +81,7 @@ func (a *Allocator) doNetworkInit(ctx context.Context) error {
unallocatedNetworks: make(map[string]*api.Network),
ingressNetwork: newIngressNetwork(),
}
a.netCtx = nc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason a.netCtx is initialized at the end is to make sure in case of failures we don't return from doNetworkInit with a netCtx which shouldn't be there in the Allocator. This is because the Allocator has a longer life time than what happens in doNetworkInit itself.

So in all of doNetworkInit I've just passed the netCtx as an argument to functions that need it. May be we should do the same for taskCreateNetworkAttachments i.e add netCtx as an argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds kind of dangerous TBH. What about a deferred closure in doNetworkInit that clears a.netCtx if an error is being returned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think a deferred closure for error handling should satisfy that requirement as well. I am good with that.

@codecov-io
Copy link

codecov-io commented Oct 17, 2016

Current coverage is 56.54% (diff: 69.04%)

Merging #1651 into master will decrease coverage by 0.13%

@@             master      #1651   diff @@
==========================================
  Files            90         90          
  Lines         14551      14552     +1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           8248       8229    -19   
- Misses         5214       5233    +19   
- Partials       1089       1090     +1   

Sunburst

Powered by Codecov. Last update 6179fcf...a8066c1

a.netCtx is initialized too late, so if allocations happen as part of
doNetworkInit, a nil pointer dereference will cause a panic.

Initialize a.netCtx earlier and use a.netCtx directly in member
functions instead of passing the network context separately, so there is
no confusion about which to use.

Also change allocator.go to have separate entries in the waitgroup for
initialization and actually running the allocator, and defer `Done` for
both. This should prevent a panic like this from leading to a deadlock,
since the deferred `Done` will be reached.

Signed-off-by: Aaron Lehmann <[email protected]>
@aaronlehmann
Copy link
Collaborator Author

@mrjana: Updated to add a closure that clears a.netCtx in the error case, and removed networkContext parameters from the functions so there's no confusion over whether to use the argument or a.netCtx.

Copy link
Contributor

@mrjana mrjana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mrjana
Copy link
Contributor

mrjana commented Oct 17, 2016

@aaronlehmann Adding a test case to cover this scenario would be nice. I will try to add one.

@aluzzardi aluzzardi merged commit f8ec492 into moby:master Oct 18, 2016
@aluzzardi
Copy link
Member

@aaronlehmann @mrjana Please add a test case in a separate PR

@aaronlehmann aaronlehmann deleted the allocator-crash branch October 18, 2016 23:38
mavenugo added a commit to mavenugo/docker that referenced this pull request Oct 19, 2016
To identify the issue in allocator.

Signed-off-by: Madhu Venugopal <[email protected]>
mavenugo added a commit to mavenugo/docker that referenced this pull request Oct 22, 2016
Also Cherry-pick moby/swarmkit#1651
to identify the issue in allocator.

Signed-off-by: Madhu Venugopal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants