-
Notifications
You must be signed in to change notification settings - Fork 6
Zero downtime AspNet Core and Kubernetes rolling updates
This article assumes that the reader is acquainted to POD, Service, ReplicaSet and Deployment concepts of Kubernetes. Ingress concept will also discussed but is not a requirement. If you new to Kubernetes, the official Kubernetes tutorial is a good starting point.
One of the greatest features of Kubernetes is the ability to perform updates in production environments with zero downtime.
This article will talk about how to get this feature working properly with AspNet Core applications. It is splitted into two sections. The first one, named Configuring readnessProbe, is not Asp Net Core specific and talks about how to make Kubernetes Service concept aware about which POD is ready to receive traffic and which is not. The next section, named Graceful termination, talks about how to make the app able to perform graceful terminations in order to ensure that no in progress requests will be interrupted till get completed.
Configuring readnessProbe is pretty simple and is described in two steps:
- Make the app providing a url designed to accept GET requests and response with HTTP 200.
- Configure POD's readnessProbe pointing to the url defined in previous step.
The code snippet bellow (myapp-deployment.yaml) is an example of a Deployment specification containing a single container POD with readnessProbe pointing to "/heathz":
myapp-deployment.yaml:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: myapp-deployment
labels:
app: myapp
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
terminationGracePeriodSeconds: 60
containers:
- name: myapp
image: myimage:myimagetag
ports:
- containerPort: 80
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 80
scheme: HTTP
periodSeconds: 5
initialDelaySeconds: 5
successThreshold: 1
timeoutSeconds: 30
The basic concept is: kubelet will perform requests to /healthz periodically in order to determine which POD is ready to receive traffic and which is not. The Service related to this Deployment will chose the POD to send traffic considering this information. The Service will not include a POD in the load balance until it is ready. This is a basic premise to get zero downtime rolling updates working properly.
Important note: The traffic should not be sent directly to the PODs, it must sent to Service, then the Service will proxy it to a readness POD.
The code snippet bellow (myapp-service.yaml) is an example of Service specification written to proxy myapp:
myapp-service.yaml:
apiVersion: v1
kind: Service
metadata:
name: myapp-service
labels:
app: myapp
spec:
type: ClusterIP
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: myapp
Important note: I didn't get success for NodePort Service, zero downtime rolling updates works for me only if the Service type is ClusterIP. For NodePort, while simulating a high dense traffic scenario (with a lot of parallel requests), after ReplicaSet ask the POD to terminate, the Service kept sending new requests to the old POD.
If you're using NGINX Ingress Controller to expose your application outside the cluster, you need add ingress.kubernetes.io/service-upstream: "true"
annotation to the ingress definition in order to make NGINX sending traffic to Service instead of send traffic direct to the POD, which is it default.
The code snippet bellow (myapp-ingress.yaml) is an example of Ingress specification written to expose myapp:
myapp-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: myapp-ingress
labels:
app: myapp
annotations:
kubernetes.io/ingress.class: "nginx"
ingress.kubernetes.io/service-upstream: "true"
spec:
rules:
- host: "myapp.mycluster.com"
http:
paths:
- path: /
backend:
serviceName: "myapp-service"
servicePort: 80
Kubernetes provides two kind of probes: livenessProbe and readnessProbe. The first one, the livenessProbe, is used by kubelet to make decision about restart a POD in the case of POD is unhealtly, while the readnessProbe provides information to kube-proxy (used by Service concept) in order to make it aware about which POD is read to receive traffic and which is not.
You should note that livenessProbe were not included in deployment example above. If you want to configure it (recomended), the same properties of readnessProbe are valid for livenessProbe.
You should read more about kubernetes probes at: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
When a rolling update occur, ReplicaSet will initiate new PODs then after the new PODs is ready to accept requests, the Service will no longer send new traffic to the old ones, then ReplicaSet will ask it to terminate by sending a SIGTERM (unix kill message). The point here is that the old PODs should be processing requests initiated before the Service stops sending traffic to it and ReplicaSet ask it for termination. The key to ensure zero downtime while doing this is that the app needs to still alive until all requests were completed.
Graceterm library takes care of hold app alive in order to complete the pending requests.
Install and configure Graceterm on your app is very easy:
Install Nuget package: Graceterm
Considering a standard AspNet Core application, edit Configure
method of Startup
by adding app.UseGraceterm()
invocation. The graceterm should be on top of request pipeline to work properly, this means that you must add the app.UseGraceterm()
invocation before any other app.UseSomething()
.
If you are using a custom logging configuration, you should put the code which configures log before the app.UseGraceterm()
in order to Graceterm generate logs according to your preferences.
using Graceterm;
public class Startup
{
...
public void Configure(IApplicationBuilder app, IHostingEnvironment env, ...)
{
//
// Custom logging configuration goes here.
//
// Add graceterm just after logging configuration and before
// any other middleware.
app.UseGraceterm(options => options.IgnorePath("/healthz"));
app.UseOtherMiddleware();
...
}
...
}
Graceterm will hold the app alive until all requests were completed or a timeout occurr, the default value for timeout is 60 seconds. You may modify timeout and some other Graceterm behavior as your needs. All Graceterm options are describe at https://github.com/mnconsulting/graceterm/blob/master/README.md.