A dead simple health checker for GO services.
Checks the availability of all services your service depends on and provides /.well-known/alive
and /.well-known/ready
endpoints. Supports some probes out of the box and can be extended by your own readiness probe. See all available probes or create a custom probe.
Learn more about health checks.
Serve on same port with your application
func main() {
checker := health.Checker{}
// This can be any http.ServerMux
checker.AppendHealthEndpoints(http.DefaultServeMux)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("Hello World!"))
})
_ = http.ListenAndServe(":8080", http.DefaultServeMux)
}
Serve on separate port
func main() {
checker := &health.Checker{}
defer checker.ServeHTTPBackground(":8080")()
// Check an external gRPC Service
cc, _ := grpc.Dial(...)
checker.AddReadinessProbe("my-grpc-service", health.GrpcProbe(cc))
}
A health.Probe
is just a plain function returning an error
if the service can not be reached. The probe is called any time the readiness endpoint is called. Thus use the most simple way to check if the service you depend on is up and running.
// Checks if the customServiceConn can be reached.
func MyCustomServiceProbe(srv *customService) health.Probe {
return func() error {
// Check your service for availability
available, err := srv.Ping()
if !available || err != nil {
// Service is not available. We return an error.
return fmt.Errorf("service is unavailable: %v", err)
}
// My depending service is up and running!
return nil
}
}
Usage
checker := &health.Checker{}
src := NewCustomService()
checker.AddReadinessProbe("my-service", MyCustomServiceProbe(srv))
This package is designed to seamlessly integrate with kubernetes. Lets asume we have a container image named company/my-service
which uses this package. Than you can add the following lines to your deployment to enable monitoring by kubernetes. Kubernetes is now automatically restarting your service if it does not return alive in three consequitive requests. Also the pod is skipped by the load balancer as long it isn't ready. Learn more about health checks.
apiVersion: apps/v1
kind: Deployment
metadata:
name: service1
spec:
selector:
matchLabels:
app: service1
template:
metadata:
labels:
app: service1
spec:
containers:
- name: service1
image: company/my-service:latest
ports:
- containerPort: 80
+ livenessProbe:
+ httpGet:
+ path: /.well-known/alive
+ port: 80
+ readinessProbe:
+ httpGet:
+ path: /.well-known/ready
+ port: 80
Kubernetes distinguishes between liveliness and readiness checks. Thus, our services should provide two endpoints, one to check if the service is alive and one to check if the service is ready.
Alive
A service is defined as alive, if it started correctly and accepts incoming requests. A service which is not alive for more than three times in a row will be killed and automatically restarted.
Ready
A service is defined as ready, if all mandatory dependent services, for example a databases, can be reached and the service can work as expected. A service which is not ready for more than three times in a row will be skipped by the internal load balancer.
A service which is alive, but not ready has to recover itself.
ℹ Per default both states are checked every 2 seconds after an initial delay of 10 seconds. If a service needs more than 5 seconds to come up (alive=true), you should increase the initial delay to twice the mean startup time.
The following table contains a set of common states / events and the expected health report.
State | Alive | Ready |
---|---|---|
Startup phase | false | false |
Ready | true | true |
Deadlock | false | false |
Heavy load due to processing lots of data | true | false |
Database cannot be reached | true | false |
Mandatory Service cannot be reached | true | false |
Volume has insufficient space | true | false |
Slow response from service / database | true | true |
Leader cannot be reached | true | false |
A service must implement a health endpoint to check if it is alive and ready. Both have to be served via HTTP/1.1
under the same port on all interfaces. The routes should be /.well-known/alive
and /.well-known/ready
. Those endpoints must not require any authentication or any additional header. Response should either be 200 OK
or 503 Service Unavailable
and a minimal JSON body.
Both endpoints should be served independently and next to the main application on a different port.
The response for the liveliness probe should be a simple true or false.
/.well-known/alive
: success
HTTP/1.1 200 OK
Content-Type: application/json
{
"alive": true
}
/.well-known/alive
: failure
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"alive": false
}
The response for the readiness probe should be a simple true or false. For debug purpose the failure response can contain a list of simple reasons, why a service is unhealthy. Detailed information should be reported via the metrics / telemetry endpoints.
/.well-known/ready
: success
HTTP/1.1 200 OK
Content-Type: application/json
{
"ready": true
}
/.well-known/ready
: failure
HTTP/1.1 503 Service Unavailable
Content-Type: application/json
{
"ready": false,
"reasons": [
"dgraph: Service unreachable"
]
}