Why need health-check

Current status
- We can only check whether the k8s scheduled container is running or not, couldn't exactly know the real supervisor service inside container is healthy or not. K8s need to have a probe configuration to know the state inside container
Problems should be resolved
- Health-check should check whether the real supervisor service is healthy or not
- If not healthy, there better be different error codes to help debug

How do we Implement the health-check

Leverage k8s probe tool
- K8s has three kinds of probe, liveness, readiness and startup probes which can help us probe state inside container. we can use readiness probe for our scenario.
- Every probe has four probe types, http|TCP|gRPC|command, command type should be good for our scenario.
- Command type means that we could deploy one script inside container, k8s will call this script and record the exit code during the container's whole lifecycle. We can use different non-zero exit codes to recognize what the issue is if probe failed.
- Command probe example:
Controller action from k8s side
- During container upgrade process, our k8s controller will use the readiness probe result. when our k8s controller is in post-check stage, controller will check the readiness probe exit code, if the exit code is zero, controller think it's healthy, post-check will proceed. If the exit code is non-zero, controller think it's not healthy and will re-check again and again util timeout. If timeout happens, controller will do fallback.
Why not startup and liveness probe
- For startup probe, k8s will help restart container if startup probe failed number meet limit, we don't need k8s do this, we will do fallback to handle this case other than do restart.
- For liveness probe, existing code has already implemented the liveness probe function. Supervisor exit-listener service inside container will kill itself container to handle critical service unexpected exit.

The start.sh script is for the real service initialization. Check the initialization finished or not.
Including call k8s startup script, k8s startup script will participate the upgrade process, proceeding to upgrade container must be after the k8s startup script finished, so need to check start service whether exit as expected.