Skip to content

Add health check on API server#522

Merged
merenbach merged 26 commits intoargoproj:masterfrom
merenbach:520-add-health-probes
Aug 23, 2018
Merged

Add health check on API server#522
merenbach merged 26 commits intoargoproj:masterfrom
merenbach:520-add-health-probes

Conversation

@merenbach
Copy link
Copy Markdown
Contributor

@merenbach merenbach commented Aug 20, 2018

Closes #520. Leaving out repo server health checks since we may need to do a SPIKE to determine scope.

Visit 127.0.0.1:8080/healthz (or any production /healthz endpoint) to check the health of the API server. The endpoint will return the text ok and and a 200 status code if all is well:

$ curl -i 127.0.0.1:8080/healthz
HTTP/1.1 200 OK
Date: Mon, 20 Aug 2018 22:22:46 GMT
Content-Length: 3
Content-Type: text/plain; charset=utf-8

ok

...and a 503 status code otherwise:

$ curl -i 127.0.0.1:8080/healthz
HTTP/1.1 503 Service Unavailable
Date: Mon, 20 Aug 2018 22:26:51 GMT
Content-Length: 0

@merenbach merenbach requested a review from jessesuen August 20, 2018 23:12
@merenbach merenbach changed the title [WIP] Add health check on API server Add health check on API server Aug 20, 2018
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 3 second interval is a bit too aggressive. Lets up this to 30 seconds. Also it doesn't make sense for the initialDelaySeconds to be less than the readiness' delay. Lets start liveness at 30.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can be more aggressive for readiness than liveness since it only happens during server startup. Plus our server comes up very quickly. We should the same healthz endpoint to verify the pod can talk to k8s before claiming ready:

          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 2
          periodSeconds: 1
          failureThreshold: 30

@jessesuen
Copy link
Copy Markdown
Member

@merenbach did you test this end-to-end? How does this handle the case where the API server is served over HTTPS vs. HTTP?

@merenbach
Copy link
Copy Markdown
Contributor Author

@jessesuen this is now tested with Docker images locally. The probes are in place and come up as intended in minikube when the argocd-server deployment is created and the argocd-server service starts. I believe it's working fine with HTTPS, per the following output based on a service created with an HTTPS-only manifest:

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
argocd-repo-server   ClusterIP   10.110.146.142   <none>        8081/TCP        18m
argocd-server        NodePort    10.111.16.67     <none>        443:31797/TCP   5s
WDHL169c4b885:argo-cd amerenbach$ kubectl proxy -n argocd argocd-server
Starting to serve on 127.0.0.1:8001

I'm able to visit in a browser and the proxy seems to be handling this fine. Please let me know if this is what we were looking for.

port: 8080
initialDelaySeconds: 2
periodSeconds: 1
failureThreshold: 30
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of readiness was wrong. Readiness is not just during server startup. It applies throughout the lifetime of the pod, so we cannot be so aggressive. Lets remove liveness entirely and simply have readiness with the following settings:

        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 30

@jessesuen
Copy link
Copy Markdown
Member

per the following output based on a service created with an HTTPS-only manifest:

Can you paste the kubectl get pod argocd-server output with this in place? The readiness column is the most relevant.

@merenbach
Copy link
Copy Markdown
Contributor Author

@jessesuen Here's a Bourne script I threw together to do an e2e test on this feature:

#! /bin/sh -x

build_containers() {
	docker pull argoproj/argocd-ui

	USERNAME=$1
	for CONTAINER in argocd-application-controller argocd-repo-server argocd-server argocd-ui
	do
		docker tag "argoproj/${CONTAINER}" "${USERNAME}/${CONTAINER}"
		docker push "${USERNAME}/${CONTAINER}"
	done
}

apply_manifest() {
	cat manifests/components/*.yaml | sed "s/argoproj\/\([^:]*\):.*$/andrewdm\/\1:latest/g" | tee | kubectl -n argocd apply -f -
}

fullstatus() {
	kubectl -n argocd get pods
	kubectl -n argocd get deployments
	kubectl -n argocd get services
}

## Note the following manifest changes made by hand before this script is run:
# $ git diff
# diff --git a/manifests/components/04e_argocd-server-service.yaml b/manifests/components/04e_argocd-server-service.yaml
# index 3df6cbc..9b2daa9 100644
# --- a/manifests/components/04e_argocd-server-service.yaml
# +++ b/manifests/components/04e_argocd-server-service.yaml
# @@ -4,11 +4,12 @@ kind: Service
#  metadata:
#    name: argocd-server
#  spec:
# +  type: NodePort
#    ports:
# -  - name: http
# -    protocol: TCP
# -    port: 80
# -    targetPort: 8080
# +  # - name: http
# +  #   protocol: TCP
# +  #   port: 80
# +  #   targetPort: 8080
#    - name: https
#      protocol: TCP
#      port: 443

fullstatus

build_containers andrewdm
apply_manifest
kubectl -n argocd rollout status deployment/argocd-server
kubectl -n argocd get pods -l app=argocd-server
# can get pod name with: kubectl -n argocd get pods --no-headers -l app=argocd-server -o jsonpath='{.items[0].metadata.name}'

fullstatus

kubectl proxy -n argocd argocd-server &
sleep 5

curl -w "\n" -s 127.0.0.1:8001/healthz && echo 'Reached health endpoint successfully' || echo 'An error occurred'

pkill -f 'kubectl proxy'

Here's the output:

+ fullstatus
+ kubectl -n argocd get pods
No resources found.
+ kubectl -n argocd get deployments
No resources found.
+ kubectl -n argocd get services
No resources found.
+ build_containers andrewdm
+ docker pull argoproj/argocd-ui
Using default tag: latest
latest: Pulling from argoproj/argocd-ui
Digest: sha256:fdf1dae7a1d7a233788ac463fd6422dfa6a4d1b9b7c91c12397708b5a79c246a
Status: Image is up to date for argoproj/argocd-ui:latest
+ USERNAME=andrewdm
+ for CONTAINER in argocd-application-controller argocd-repo-server argocd-server argocd-ui
+ docker tag argoproj/argocd-application-controller andrewdm/argocd-application-controller
+ docker push andrewdm/argocd-application-controller
The push refers to repository [docker.io/andrewdm/argocd-application-controller]
166f2e176f83: Layer already exists 
ded1db22cea2: Layer already exists 
55b09646bfa3: Layer already exists 
7367f69251d8: Layer already exists 
8568818b1f7f: Layer already exists 
latest: digest: sha256:cc8a86b1ad5755c3de7593892b2bf4bbc970ec8f6b1265ec4146d2fdf8b95301 size: 1377
+ for CONTAINER in argocd-application-controller argocd-repo-server argocd-server argocd-ui
+ docker tag argoproj/argocd-repo-server andrewdm/argocd-repo-server
+ docker push andrewdm/argocd-repo-server
The push refers to repository [docker.io/andrewdm/argocd-repo-server]
166f2e176f83: Layer already exists 
ded1db22cea2: Layer already exists 
55b09646bfa3: Layer already exists 
7367f69251d8: Layer already exists 
8568818b1f7f: Layer already exists 
latest: digest: sha256:cc8a86b1ad5755c3de7593892b2bf4bbc970ec8f6b1265ec4146d2fdf8b95301 size: 1377
+ for CONTAINER in argocd-application-controller argocd-repo-server argocd-server argocd-ui
+ docker tag argoproj/argocd-server andrewdm/argocd-server
+ docker push andrewdm/argocd-server
The push refers to repository [docker.io/andrewdm/argocd-server]
166f2e176f83: Layer already exists 
ded1db22cea2: Layer already exists 
55b09646bfa3: Layer already exists 
7367f69251d8: Layer already exists 
8568818b1f7f: Layer already exists 
latest: digest: sha256:cc8a86b1ad5755c3de7593892b2bf4bbc970ec8f6b1265ec4146d2fdf8b95301 size: 1377
+ for CONTAINER in argocd-application-controller argocd-repo-server argocd-server argocd-ui
+ docker tag argoproj/argocd-ui andrewdm/argocd-ui
+ docker push andrewdm/argocd-ui
The push refers to repository [docker.io/andrewdm/argocd-ui]
51953d49ea9a: Layer already exists 
717b092b8c86: Layer already exists 
latest: digest: sha256:fdf1dae7a1d7a233788ac463fd6422dfa6a4d1b9b7c91c12397708b5a79c246a size: 740
+ apply_manifest
+ sed 's/argoproj\/\([^:]*\):.*$/andrewdm\/\1:latest/g'
+ tee
+ kubectl -n argocd apply -f -
+ cat manifests/components/01a_application-crd.yaml manifests/components/01b_appproject-crd.yaml manifests/components/02a_argocd-cm.yaml manifests/components/02b_argocd-secret.yaml manifests/components/02c_argocd-rbac-cm.yaml manifests/components/03a_application-controller-sa.yaml manifests/components/03b_application-controller-role.yaml manifests/components/03c_application-controller-rolebinding.yaml manifests/components/03d_application-controller-deployment.yaml manifests/components/04a_argocd-server-sa.yaml manifests/components/04b_argocd-server-role.yaml manifests/components/04c_argocd-server-rolebinding.yaml manifests/components/04d_argocd-server-deployment.yaml manifests/components/04e_argocd-server-service.yaml manifests/components/05a_argocd-repo-server-deployment.yaml manifests/components/05b_argocd-repo-server-service.yaml
customresourcedefinition.apiextensions.k8s.io/applications.argoproj.io configured
customresourcedefinition.apiextensions.k8s.io/appprojects.argoproj.io configured
configmap/argocd-cm configured
secret/argocd-secret unchanged
configmap/argocd-rbac-cm unchanged
serviceaccount/application-controller unchanged
role.rbac.authorization.k8s.io/application-controller-role unchanged
rolebinding.rbac.authorization.k8s.io/application-controller-role-binding unchanged
deployment.apps/application-controller created
serviceaccount/argocd-server unchanged
role.rbac.authorization.k8s.io/argocd-server-role unchanged
rolebinding.rbac.authorization.k8s.io/argocd-server-role-binding unchanged
deployment.apps/argocd-server created
service/argocd-server created
deployment.apps/argocd-repo-server created
service/argocd-repo-server created
+ kubectl -n argocd rollout status deployment/argocd-server
Waiting for deployment "argocd-server" rollout to finish: 0 of 1 updated replicas are available...
deployment "argocd-server" successfully rolled out
+ kubectl -n argocd get pods -l app=argocd-server
NAME                             READY     STATUS    RESTARTS   AGE
argocd-server-8578f88df7-7p66r   2/2       Running   0          9s
+ fullstatus
+ kubectl -n argocd get pods
NAME                                      READY     STATUS    RESTARTS   AGE
application-controller-7dcdddf77f-zp49l   1/1       Running   0          10s
argocd-repo-server-74f5fcdf7-89wgs        1/1       Running   0          10s
argocd-server-8578f88df7-7p66r            2/2       Running   0          10s
+ kubectl -n argocd get deployments
NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
application-controller   1         1         1            1           10s
argocd-repo-server       1         1         1            1           10s
argocd-server            1         1         1            1           10s
+ kubectl -n argocd get services
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
argocd-repo-server   ClusterIP   10.99.168.16    <none>        8081/TCP        10s
argocd-server        NodePort    10.102.208.29   <none>        443:32508/TCP   10s
+ sleep 5
+ kubectl proxy -n argocd argocd-server
Starting to serve on 127.0.0.1:8001
+ curl -w '\n' -s 127.0.0.1:8001/healthz
ok
+ echo 'Reached health endpoint successfully'
Reached health endpoint successfully
+ pkill -f 'kubectl proxy'

@merenbach
Copy link
Copy Markdown
Contributor Author

@jessesuen also adding a pod describe:

$ kubectl -n argocd describe pods -l app=argocd-server
Name:           argocd-server-8578f88df7-7p66r
Namespace:      argocd
Node:           minikube/10.0.2.15
Start Time:     Wed, 22 Aug 2018 15:38:08 -0700
Labels:         app=argocd-server
                pod-template-hash=4134944893
Annotations:    <none>
Status:         Running
IP:             172.17.0.5
Controlled By:  ReplicaSet/argocd-server-8578f88df7
Init Containers:
  copyutil:
    Container ID:  docker://8500f5a9a9288983f318a175072e28bb90dbd7789e053a2e18d03f63e2fc28f3
    Image:         andrewdm/argocd-server:latest
    Image ID:      docker-pullable://andrewdm/argocd-application-controller@sha256:cc8a86b1ad5755c3de7593892b2bf4bbc970ec8f6b1265ec4146d2fdf8b95301
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /argocd-util
      /shared
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 22 Aug 2018 15:38:12 -0700
      Finished:     Wed, 22 Aug 2018 15:38:12 -0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /shared from static-files (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argocd-server-token-dx8tw (ro)
  ui:
    Container ID:  docker://60642e037245165caf2ea04162b60227d8b52006510a6ea411151f9bef7aa774
    Image:         andrewdm/argocd-ui:latest
    Image ID:      docker-pullable://andrewdm/argocd-ui@sha256:fdf1dae7a1d7a233788ac463fd6422dfa6a4d1b9b7c91c12397708b5a79c246a
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /app
      /shared
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 22 Aug 2018 15:38:14 -0700
      Finished:     Wed, 22 Aug 2018 15:38:15 -0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /shared from static-files (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argocd-server-token-dx8tw (ro)
Containers:
  argocd-server:
    Container ID:  docker://1a112368f84e7fd5b34350e0acbe9fe397e4ee5f662e52afbc783427f3628f25
    Image:         andrewdm/argocd-server:latest
    Image ID:      docker-pullable://andrewdm/argocd-application-controller@sha256:cc8a86b1ad5755c3de7593892b2bf4bbc970ec8f6b1265ec4146d2fdf8b95301
    Port:          <none>
    Host Port:     <none>
    Command:
      /argocd-server
      --staticassets
      /shared/app
      --repo-server
      argocd-repo-server:8081
    State:          Running
      Started:      Wed, 22 Aug 2018 15:38:17 -0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /shared from static-files (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argocd-server-token-dx8tw (ro)
  dex:
    Container ID:  docker://ea38e5bc8c96437e252589087b4f9f747479ff726344fd108883447940c1180e
    Image:         quay.io/coreos/dex:v2.10.0
    Image ID:      docker-pullable://quay.io/coreos/dex@sha256:218f898d8f0cbbb190c76404bb13d599ac64c64384a999472e2278ed4e34496f
    Port:          <none>
    Host Port:     <none>
    Command:
      /shared/argocd-util
      rundex
    State:          Running
      Started:      Wed, 22 Aug 2018 15:38:17 -0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /shared from static-files (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argocd-server-token-dx8tw (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  static-files:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  argocd-server-token-dx8tw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argocd-server-token-dx8tw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From               Message
  ----    ------                 ----  ----               -------
  Normal  Scheduled              4m    default-scheduler  Successfully assigned argocd-server-8578f88df7-7p66r to minikube
  Normal  SuccessfulMountVolume  4m    kubelet, minikube  MountVolume.SetUp succeeded for volume "static-files"
  Normal  SuccessfulMountVolume  4m    kubelet, minikube  MountVolume.SetUp succeeded for volume "argocd-server-token-dx8tw"
  Normal  Pulling                4m    kubelet, minikube  pulling image "andrewdm/argocd-server:latest"
  Normal  Pulled                 4m    kubelet, minikube  Successfully pulled image "andrewdm/argocd-server:latest"
  Normal  Created                4m    kubelet, minikube  Created container
  Normal  Started                4m    kubelet, minikube  Started container
  Normal  Pulling                4m    kubelet, minikube  pulling image "andrewdm/argocd-ui:latest"
  Normal  Pulled                 4m    kubelet, minikube  Successfully pulled image "andrewdm/argocd-ui:latest"
  Normal  Created                4m    kubelet, minikube  Created container
  Normal  Started                4m    kubelet, minikube  Started container
  Normal  Pulling                4m    kubelet, minikube  pulling image "andrewdm/argocd-server:latest"
  Normal  Pulled                 4m    kubelet, minikube  Successfully pulled image "andrewdm/argocd-server:latest"
  Normal  Created                4m    kubelet, minikube  Created container
  Normal  Started                4m    kubelet, minikube  Started container
  Normal  Pulled                 4m    kubelet, minikube  Container image "quay.io/coreos/dex:v2.10.0" already present on machine
  Normal  Created                4m    kubelet, minikube  Created container
  Normal  Started                4m    kubelet, minikube  Started container

@merenbach merenbach merged commit 8d9e4fa into argoproj:master Aug 23, 2018
merenbach added a commit that referenced this pull request Aug 23, 2018
@mukulikak mukulikak added this to the 0.8 milestone Aug 27, 2018
@merenbach merenbach deleted the 520-add-health-probes branch October 24, 2018 21:08
leoluz added a commit to leoluz/argo-cd that referenced this pull request Mar 13, 2025
* feat: Implement Server-Side Diffs

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* trigger build

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* chore: remove unused function

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* make HasAnnotationOption more generic

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* add server-side-diff printer option

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* remove managedFields during server-side-diff

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* add ignore mutation webhook logic

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* fix configSet

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix comparison

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* merge typedconfig in typedpredictedlive

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* handle webhook diff conflicts

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix webhook normalization logic

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* address review comments 1/2

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* address review comments 2/2

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* fix lint

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* remove kubectl getter from cluster-cache

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* fix query param verifier instantiation

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* Add server-side-diff unit tests

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

---------

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants