Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid returning early on agent join failures #1473

Merged
merged 1 commit into from
Sep 27, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,7 @@ func (c *controller) agentSetup() error {

if remoteAddr != "" {
if err := c.agentJoin(remoteAddr); err != nil {
logrus.Errorf("Error in agentJoin : %v", err)
return nil
logrus.Errorf("Error in joining gossip cluster : %v(join will be retried in background)", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not returning the error on the first place (returning nil). Am not sure how this will fix the issue mentioned in the description. The only possible effect this has is the fact that agentInitDone is not reinitialized. Is that the issue that you are intending to fix ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we return early agentInitDone is not closed. If it is not closed the go routing waiting on it to proceed to plumbing the ingress sandbox will not happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Just noticed that even though the networkDB retry logic takes effect, call to nDB.sendNodeEvent is not involved in that retry. am not sure if that is required. Just giving my observation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not strictly needed as the other node will be able to work with just getting a join notification from memberlist. But I think it is still correct to make that fix. Will update the PR.

}

Expand Down
4 changes: 4 additions & 0 deletions networkdb/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,10 @@ func (nDB *NetworkDB) retryJoin(members []string, stop <-chan struct{}) {
logrus.Errorf("Failed to join memberlist %s on retry: %v", members, err)
continue
}
if err := nDB.sendNodeEvent(NodeEventTypeJoin); err != nil {
logrus.Errorf("failed to send node join on retry: %v", err)
continue
}
return
case <-stop:
return
Expand Down