Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K8s / devops] Reviewing the liveness probe endpoint #4383

Closed
hackintoshrao opened this issue Dec 9, 2019 · 1 comment
Closed

[K8s / devops] Reviewing the liveness probe endpoint #4383

hackintoshrao opened this issue Dec 9, 2019 · 1 comment
Assignees
Labels
area/kubernetes Related to running Dgraph on K8s area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/bug Something is broken.

Comments

@hackintoshrao
Copy link
Contributor

What version of Dgraph are you using?

v1.1

The kubelet uses liveness probes to know when to restart a Container.
For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.

This is basically used to automate all the scenarios where an irresponsive database needs to restart.
But a wrong implementation would restart the pods when it's not necessary to do so.

It would be important to review it once and write a report on two things:

  • Does the current liveness probe endpoint guarantee the detection of irresponsive database state?
  • Review for any false positives.
@hackintoshrao hackintoshrao added kind/enhancement Something could be better. area/kubernetes Related to running Dgraph on K8s area/devops labels Dec 9, 2019
@fristonio
Copy link
Contributor

fristonio commented Dec 9, 2019

Currently, we have the readiness probe and liveness probes in our helm configuration for dgraph alpha. We are using the same endpoint for both liveness and readiness check, which is a probable bug in the sense that when a node tries to apply the raft snapshot, it sets the health status to unhealthy(here), and if probed at this time, Kubernetes will restart the container which should not be the case as the database is operational.

Liveness Probe

Liveness probe is for cases when we want to know that the container is not dead, which essentially means that it might not be able to serve our business logic at this time but is operational.

For liveness probes, most of the implementations assume that if our HTTP server can give out any response back the application is live(might not be ready to serve the traffic but is live). In cockroachdb’s case this endpoint simply returns details about the node.

Readiness probe

The readiness probe is to check if the application can serve our business logic, which in Dgraph’s case is to check if we can process the database transactions. For alphas, the readiness probe defined in the helm chart does something similar looking at a globally updatable status.

CockroachDB also operates in a similar manner and behaves differently for liveness and readiness probe.

To further improve the readiness probe would mean we will have to integrate health checks deep into our source code and probably brainstorm on what we will consider a not ready state for Dgraph.

@hackintoshrao hackintoshrao added kind/bug Something is broken. and removed kind/enhancement Something could be better. labels Dec 9, 2019
@danielmai danielmai added area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. and removed area/devops labels Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes Related to running Dgraph on K8s area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/bug Something is broken.
Development

No branches or pull requests

3 participants