Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple fixes to improve cluster ops #3337

Merged
merged 3 commits into from
Apr 30, 2019
Merged

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Apr 29, 2019

Multiple fixes to improve cluster ops:

  1. Add a way to mark a proposal as important, so it can bypass the pending proposal throttle mechanism. This is important to ensure that the latest updates from Zero are always proposed.
  2. Do not retry connection to a dead server in a busy-wait loop. Instead, use a ticker (500ms) to pause before a retry. On error, only log once. Return after 60s.
  3. Output all the configs on start to log file. Remove a broken dgraph_conf var.

This change is Reviewable

@manishrjain manishrjain requested a review from a team as a code owner April 29, 2019 22:13
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)


conn/node.go, line 380 at r1 (raw file):

	// Exit after this deadline. Let BatchAndSendMessages create another goroutine, if needed.
	deadline := time.Now().Add(60 * time.Second)
	ticker := time.NewTicker(time.Second)

PR says the ticker is 500ms but this says 1 second?

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 7 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @manishrjain)


edgraph/config.go, line 20 at r1 (raw file):

import (
	"expvar"

why are the changes in this file part of this PR?

Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)


conn/node.go, line 380 at r1 (raw file):

Previously, danielmai (Daniel Mai) wrote…

PR says the ticker is 500ms but this says 1 second?

Will update the PR description.


edgraph/config.go, line 20 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

why are the changes in this file part of this PR?

No longer need to set the conf var, I just log all the options. It was broken to begin with.

Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)


conn/node.go, line 380 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Will update the PR description.

Done.


edgraph/config.go, line 20 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No longer need to set the conf var, I just log all the options. It was broken to begin with.

Done.

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)

@manishrjain manishrjain merged commit 26e8f0f into master Apr 30, 2019
@manishrjain manishrjain deleted the mrjn/fix-broken-cluster branch April 30, 2019 00:23
manishrjain added a commit that referenced this pull request Apr 30, 2019
Multiple fixes to improve cluster ops:

1. A delta proposal should bypass the throttle pending proposal mechanism. This is important to ensure that the latest updates from Zero are always proposed.
2. Do not retry connection to a dead server in a busy-wait loop. Instead, use a ticker (1s) to pause before a retry. On error, only log once. Return after 60s.
3. Output all the configs on start to log file. Remove a broken dgraph_conf var.

Changes:
* Fix the flakiness of blockade test. Was happening because of the 10s timeout error during conn creation on a stopped node.
* Simplify the change by removing the Important bit. Instead just check for Delta field during proposal.
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Multiple fixes to improve cluster ops:

1. A delta proposal should bypass the throttle pending proposal mechanism. This is important to ensure that the latest updates from Zero are always proposed.
2. Do not retry connection to a dead server in a busy-wait loop. Instead, use a ticker (1s) to pause before a retry. On error, only log once. Return after 60s.
3. Output all the configs on start to log file. Remove a broken dgraph_conf var.

Changes:
* Fix the flakiness of blockade test. Was happening because of the 10s timeout error during conn creation on a stopped node.
* Simplify the change by removing the Important bit. Instead just check for Delta field during proposal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants