Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib: update k8s liveness and readiness probes #4405

Merged
merged 2 commits into from
Dec 27, 2019
Merged

Conversation

fristonio
Copy link
Contributor

@fristonio fristonio commented Dec 12, 2019

  • Fix liveness probe for dgraph alpha. Earlier there was a single endpoint for both liveness and readiness probes which can lead to some problems as described in [K8s / devops] Reviewing the liveness probe endpoint #4383. This PR splits the single endpoint into two, one for readiness and one for liveness. The liveness probe does not take into consideration if the alpha is ready to take request, but ensure that the container is not in a frozen state and can serve some traffic.

The liveness probe can be triggered by providing a live=1 GET parameter to the existing /health endpoint. The /health endpoint, however, corresponds to a readiness probe. This is similar to how cockroach DB handles these probes.

  • Add liveness probe to dgraph zero.
  • Update k8s manifests to include both liveness and readiness probes.

This will require an update in the dgraph docker image dgraph/dgraph.

Fixes #4378


This change is Reviewable

* Fix liveness probe for dgraph alpha.
* Add liveness probe to dgraph zero.
* Update k8s manifests to include both liveness and readiness probes.

Signed-off-by: fristonio <[email protected]>
@fristonio fristonio requested review from danielmai, manishrjain and a team as code owners December 12, 2019 10:01
x.AddCorsHeaders(w)

w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of w.Write is not checked (from errcheck)

Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of this PR is a bit misleading. It adds a /health endpoint to Zero and changes the /health endpoint for Dgraph Alpha. The PR name and description should reflect that.

The k8s configs should continue working for the latest release for Dgraph Alpha. We can either split up the K8s config change to a separate PR and wait for the next release to support these new endpoints, or we update the docs to link to the K8s config that's on the current github.com/dgraph-io/dgraph/tree/release/$latestversion branch instead of github.com/dgraph-io/dgraph/tree/master.

Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @danielmai, @fristonio, and @manishrjain)


dgraph/cmd/alpha/run.go, line 280 at r2 (raw file):

	x.AddCorsHeaders(w)

	_, ok := r.URL.Query()["live"]

Why is a live query parameter necessary here? Why not have health checking as plain /health?

Copy link
Contributor Author

@fristonio fristonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of this PR is a bit misleading. It adds a /health endpoint to Zero and changes the /health endpoint for Dgraph Alpha. The PR name and description should reflect that.

I will update the PR title and pull request description to better explain the changes.

We can either split up the K8s config change to a separate PR and wait for the next release to support these new endpoints, or we update the docs to link to the K8s config that's on the current.

I think it makes more sense if the docs point to the release version of the dgraph rather than master, for any documentation change or issue fixes we can backport them to the required version and they would still reflect in the documentation.

Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @danielmai, @fristonio, and @manishrjain)


dgraph/cmd/alpha/run.go, line 280 at r2 (raw file):

Previously, danielmai (Daniel Mai) wrote…

Why is a live query parameter necessary here? Why not have health checking as plain /health?

We need two separate endpoints, one for liveness and one for readiness. The live get query parameter describes if the request is for liveness check or readiness. For liveness check, we assume that if the webserver is able to serve any traffic it is live and for this, we make a request with live GET parameter. For readiness, however, we take into consideration the fact that the server in some cases is not able to serve traffic even if the container is live, like when bootstrapping badger, or sharing raft state snapshot, etc. This approach is similar to what cockroach DB has. You can take a look at #4383 for discussion around this.

@fristonio fristonio changed the title contrib: fix issues with k8s liveness and readiness probes. contrib: update k8s liveness and readiness probes Dec 17, 2019
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should make the links specific to the docs version.

:lgtm:. Make sure the PR name and description clearly mention what this change is when you merge it.

Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @fristonio and @manishrjain)


dgraph/cmd/alpha/run.go, line 280 at r2 (raw file):

Previously, fristonio (Deepesh Pathak) wrote…

We need two separate endpoints, one for liveness and one for readiness. The live get query parameter describes if the request is for liveness check or readiness. For liveness check, we assume that if the webserver is able to serve any traffic it is live and for this, we make a request with live GET parameter. For readiness, however, we take into consideration the fact that the server in some cases is not able to serve traffic even if the container is live, like when bootstrapping badger, or sharing raft state snapshot, etc. This approach is similar to what cockroach DB has. You can take a look at #4383 for discussion around this.

Understood, thanks.

@fristonio fristonio merged commit ed072d0 into master Dec 27, 2019
@fristonio fristonio deleted the fristonio/fix-4383 branch December 27, 2019 11:39
danielmai pushed a commit that referenced this pull request Jan 12, 2020
* Fix liveness probe for dgraph alpha.
* Add liveness probe to dgraph zero.
* Update k8s manifests to include both liveness and readiness probes.


(cherry picked from commit ed072d0)
prashant-shahi added a commit that referenced this pull request Jan 24, 2020
- Adding health endpoints to v1.1.x

Signed-off-by: Prashant Shahi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

[K8s] Readiness endpoint for Dgraph
3 participants