Skip to content

Error while proposing node removal #4056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielmai opened this issue Sep 25, 2019 · 2 comments · Fixed by #4254
Closed

Error while proposing node removal #4056

danielmai opened this issue Sep 25, 2019 · 2 comments · Fixed by #4254
Assignees
Labels
area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. area/usability Issues with usability and error messages kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.

Comments

@danielmai
Copy link
Contributor

What version of Dgraph are you using?

v1.1.0

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

Ubuntu Linux

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Run a Dgraph cluster with multiple Alpha replicas in a group.
# From dgraph-io/dgraph repo root directory
cd ./compose
./run.sh
  1. Remove an Alpha.
curl localhost:6180/removeNode?id=1&group=1
  1. The Alpha is removed successfully according to the logs and /state. But the new Alpha leader continues to print the following log every second:
E0925 00:17:12.688470       1 groups.go:322] Error while proposing node removal: Node 0x1 not part of group
github.com/dgraph-io/dgraph/conn.(*Node).ProposePeerRemoval
	/tmp/go/src/github.com/dgraph-io/dgraph/conn/node.go:594
github.com/dgraph-io/dgraph/worker.(*groupi).applyState.func1
	/tmp/go/src/github.com/dgraph-io/dgraph/worker/groups.go:320
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1337

Expected behaviour and actual result.

A successful node removal should not make an error log show up repeatedly.

@danielmai danielmai added kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it. area/usability Issues with usability and error messages area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. labels Sep 25, 2019
@raasss
Copy link

raasss commented Sep 26, 2019

Me having the same problem. But I think I did alpha node rebuild the wrong way.

  1. I stopped 1 of 3 alpha nodes.
  2. Wiped node database data and started node again.
  3. The node started to panic because I forgot to do rest api call to remove node from cluster.
  4. Stopped alpha node again
  5. Wiped node database data
  6. I did api call and removed node with http://:6080/removeNode?id=3&group=1'
  7. Started node and he joined the cluster but I started getting same error log

@martinmr martinmr self-assigned this Nov 5, 2019
@martinmr
Copy link
Contributor

martinmr commented Nov 7, 2019

There doesn't to be an actual issue but I agree that the logs should be cleaned up. Here's what's happening.

  1. Node is removed via the zero endpoint.
  2. A new leader is elected.
  3. The leader receives an update with the other node marked as removed. Node is removed successfully.
  4. Subsequent updates try to remove the node again. It's already removed and not a peer of the new leader so an error is returned.

The solution is to make the new leader know that it's already removed the node so that it doesn't try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. area/usability Issues with usability and error messages kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

3 participants