Skip to content

Commit

Permalink
Retry creating dynamic networks if not found
Browse files Browse the repository at this point in the history
In cases there are failures in task start, swarmkit might be trying to
restart the task again in the same node which might keep failing. This
creates a race where when a failed task is getting removed it might
remove the associated network while another task for the same service
or a different service but connected to the same network is proceeding
with starting the container knowing that the network is still
present. Fix this by reacting to `ErrNoSuchNetwork` error during
container start by trying to recreate the managed networks. If they
have been removed it will be recreated. If they are already present
nothing bad will happen.

Signed-off-by: Jana Radhakrishnan <[email protected]>
  • Loading branch information
mrjana committed Aug 9, 2016
1 parent eb28dde commit 117cef5
Showing 1 changed file with 18 additions and 2 deletions.
20 changes: 18 additions & 2 deletions daemon/cluster/executor/container/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
executorpkg "github.com/docker/docker/daemon/cluster/executor"
"github.com/docker/engine-api/types"
"github.com/docker/engine-api/types/events"
"github.com/docker/libnetwork"
"github.com/docker/swarmkit/agent/exec"
"github.com/docker/swarmkit/api"
"github.com/docker/swarmkit/log"
Expand Down Expand Up @@ -163,8 +164,23 @@ func (r *controller) Start(ctx context.Context) error {
return exec.ErrTaskStarted
}

if err := r.adapter.start(ctx); err != nil {
return errors.Wrap(err, "starting container failed")
for {
if err := r.adapter.start(ctx); err != nil {
if _, ok := err.(libnetwork.ErrNoSuchNetwork); ok {
// Retry network creation again if we
// failed because some of the networks
// were not found.
if err := r.adapter.createNetworks(ctx); err != nil {
return err
}

continue
}

return errors.Wrap(err, "starting container failed")
}

break
}

// no health check
Expand Down

0 comments on commit 117cef5

Please sign in to comment.