Add health check on API server#522
Add health check on API server#522merenbach merged 26 commits intoargoproj:masterfrom merenbach:520-add-health-probes
Conversation
There was a problem hiding this comment.
A 3 second interval is a bit too aggressive. Lets up this to 30 seconds. Also it doesn't make sense for the initialDelaySeconds to be less than the readiness' delay. Lets start liveness at 30.
There was a problem hiding this comment.
We can be more aggressive for readiness than liveness since it only happens during server startup. Plus our server comes up very quickly. We should the same healthz endpoint to verify the pod can talk to k8s before claiming ready:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 1
failureThreshold: 30
This reverts commit 40f490797645ed0f30d05785748e3919dea31b7f.
This reverts commit 650688dd2ee4a533e29b7df69e0bbb2436eead6b.
|
@merenbach did you test this end-to-end? How does this handle the case where the API server is served over HTTPS vs. HTTP? |
|
@jessesuen this is now tested with Docker images locally. The probes are in place and come up as intended in I'm able to visit in a browser and the proxy seems to be handling this fine. Please let me know if this is what we were looking for. |
| port: 8080 | ||
| initialDelaySeconds: 2 | ||
| periodSeconds: 1 | ||
| failureThreshold: 30 |
There was a problem hiding this comment.
My understanding of readiness was wrong. Readiness is not just during server startup. It applies throughout the lifetime of the pod, so we cannot be so aggressive. Lets remove liveness entirely and simply have readiness with the following settings:
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 30
Can you paste the |
|
@jessesuen Here's a Bourne script I threw together to do an e2e test on this feature: Here's the output: |
|
@jessesuen also adding a pod describe: |
This reverts commit 8d9e4fa.
* feat: Implement Server-Side Diffs Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * trigger build Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * chore: remove unused function Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * make HasAnnotationOption more generic Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * add server-side-diff printer option Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * remove managedFields during server-side-diff Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * add ignore mutation webhook logic Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * fix configSet Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * Fix comparison Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * merge typedconfig in typedpredictedlive Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * handle webhook diff conflicts Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * Fix webhook normalization logic Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * address review comments 1/2 Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * address review comments 2/2 Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * fix lint Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * remove kubectl getter from cluster-cache Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> * fix query param verifier instantiation Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * Add server-side-diff unit tests Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> --------- Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Closes #520. Leaving out repo server health checks since we may need to do a SPIKE to determine scope.
Visit
127.0.0.1:8080/healthz(or any production/healthzendpoint) to check the health of the API server. The endpoint will return the textokand and a200status code if all is well:...and a
503status code otherwise: