-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dgraph unresponsive on read & mutation both #2311
Comments
here is the debug vars and zero state- |
Can you share logs from other servers as well (Zero and Dgraph server)? Also, some more logs from this server would be helpful from when it was healthy just before the error logs. Ideally, if you can share full logs that would be great or at least mention the node id for which the log file is. |
@pawanrawal above log is from node-id 2. Here is full log for node 2 and 3. unfortunately, looks like we lost logs of node 1 and zero servers I can see only such logs just before the issue occured
|
Unfortunately, looks like you are using an older commit from 15 days ago. The fix for Couldn't take snapshot issue went into f66c7df as mentioned in #2266 (comment). |
@pawanrawal is dgraph hung issue also related to Also, on restarting zero we are getting following error now and zero isn't coming up.
|
Not sure, couldn't take snapshot could have caused your servers to go out of memory and Zero could have been killed. To confirm that I needed Zero logs. I can see servers are not able to talk to Zero.
This is a new one and has not been reported before, can you share your |
zw folder is of 8.8G. On tar it becomes 1.4G
Let me know if you need any specific file |
I need the whole directory to read it. Feel free to share it with me on email pawan AT dgraph DOT io. |
I have sent the whole directory over your email |
@pawanrawal have you looked at zw directory. Any clue ? |
I did have a look at it and can reproduce this issue. It requires further investigation to understand how it might have happened. Can you please open a new Github issue for this bug and continue the discussion there as this is a separate issue? |
I have created a new issue - #2327 |
Closed #2327 . If you can reproduce this on master, feel free to reopen. |
If you suspect this could be a bug, follow the template.
What version of Dgraph are you using?
Dgraph version : v1.0.4-dev
Commit SHA-1 : 807976c
Commit timestamp : 2018-03-22 14:55:24 +1100
Branch : HEAD
Have you tried reproducing the issue with latest release?
I am already using nightly build as suggested in Dgraph bulk setting up cluster #2252
What is the hardware spec (RAM, OS)?
3 dgraph data server node with ubuntu 14.04 / 8 core 32GB
3 node for zero with ubuntu 14.04/ 1 core 2GB
Steps to reproduce the issue (command/config used to run Dgraph).
Dgraph config -
I have 3 dgraph servers running in cluster with replica 3. We are using this setup since more than a week on production. Today all of a sudden we started having issue on read and mutate both. Calls are just freezing(returning no response). As this problem was in production we had no choice but to setup a new dgraph machine and remigrate the data from original source.
I can see following log in one of node roughly around the same time when dgraph became unresponsive
And since then such log -
raft.go:692: INFO: 2 [logterm: 2, index: 10314649, vote: 0] ignored MsgVote from 3 [logterm: 2, index: 10325133] at term 2: lease is not expired
are flooded in the log. Also dgraph is unresponsive since then.I have attached heap & cpu profile.
pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
pprof.dgraph.samples.cpu.001.pb.gz
The text was updated successfully, but these errors were encountered: