Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discoveryserver: readiness probe shouldn't pass when permissions are incorrect #1583

Closed
codefromthecrypt opened this issue Feb 21, 2024 · 11 comments · Fixed by #1584
Closed
Labels
Milestone

Comments

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Feb 21, 2024

Describe the bug

something like #1580 should be detected by a readiness probe, but it wasn't.

subject to opinion, but I think one of ready or health or both should fail if permissions are wrong, as otherwise you need to look at logs to notice something is amok.

Sample

I used a helm chart by @andreasfritz which also doesn't add the required permission, yet the readiness probe seems to pass (correct if wrong)

@codefromthecrypt
Copy link
Contributor Author

ps the following helm test passes despite errors in the logs

apiVersion: v1
kind: Pod
metadata:
  name: "{{ include "spring-cloud-kubernetes-discoveryserver.fullname" . }}-test-connection"
  labels:
    {{- include "spring-cloud-kubernetes-discoveryserver.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": test
spec:
  containers:
    - name: wget
      image: busybox
      command: ['wget', '-qO', '-']. # for debug print the whole response to console
      args: ['{{ include "spring-cloud-kubernetes-discoveryserver.fullname" . }}:{{ .Values.service.port }}/apps']
  restartPolicy: Never

again #1580 fixed this, but there should have been something that failed and maybe folks closer to the project can figure a fail-fast!

2024-02-21T09:14:35.056Z  WARN 1 --- [           main] o.s.c.k.client.KubernetesClientPodUtils  : error reading pod, with error : {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"spring-cloud-kubernetes-discoveryserver-8b8f89db5-zfnpd\" is forbidden: User \"system:serviceaccount:default:spring-cloud-kubernetes-discoveryserver\" cannot get resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"spring-cloud-kubernetes-discoveryserver-8b8f89db5-zfnpd","kind":"pods"},"code":403}

2024-02-21T09:14:35.057Z  WARN 1 --- [           main] o.s.c.k.client.KubernetesClientPodUtils  : Failed to get pod with name:[spring-cloud-kubernetes-discoveryserver-8b8f89db5-zfnpd]. You should look into this if things aren't working as you expect. Are you missing serviceaccount permissions?

io.kubernetes.client.openapi.ApiException: 
        at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:989) ~[client-java-api-19.0.0.jar:na]
        at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:905) ~[client-java-api-19.0.0.jar:na]
        at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPodWithHttpInfo(CoreV1Api.java:26769) ~[client-java-api-19.0.0.jar:na]
        at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPod(CoreV1Api.java:26747) ~[client-java-api-19.0.0.jar:na]
        at org.springframework.cloud.kubernetes.client.KubernetesClientPodUtils.internalGetPod(KubernetesClientPodUtils.java:87) ~[spring-cloud-kubernetes-client-autoconfig-3.1.0.jar:3.1.0]
        at org.springframework.cloud.kubernetes.commons.LazilyInstantiate.get(LazilyInstantiate.java:47) ~[spring-cloud-kubernetes-commons-3.1.0.jar:3.1.0]
        at org.springframework.cloud.kubernetes.client.KubernetesClientPodUtils.isInsideKubernetes(KubernetesClientPodUtils.java:80) ~[spring-cloud-kubernetes-client-autoconfig-3.1.0.jar:3.1.0]
        at org.springframework.cloud.kubernetes.commons.discovery.KubernetesDiscoveryClientHealthIndicatorInitializer.postConstruct(KubernetesDiscoveryClientHealthIndicatorInitializer.java:49) ~[spring-cloud-kubernetes-commons-3.1.0.jar:3.1.0]

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Feb 22, 2024

so I looked into this and basically the API doesn't use this pod data. It is grabbed just in case there is a consumer of RegisteredEventSource.

What fails and makes the large error should be reconsidered. For example, if this binary has no consumers of the data, don't ask for the pod (and avoid the permissions or scary error if they aren't there). This seems better from both an experience and a security pov.

cc @iocanel not because you intended this, but because some old version of a file in question has your author tag ;) p.s. hi again!

	@PostConstruct
	private void postConstruct() {
		LOG.debug(() -> "publishing InstanceRegisteredEvent");
		InstanceRegisteredEvent<RegisteredEventSource> instanceRegisteredEvent = new InstanceRegisteredEvent<>(
				new RegisteredEventSource("kubernetes", podUtils.isInsideKubernetes(), podUtils.currentPod().get()),
				null);
		this.applicationEventPublisher.publishEvent(instanceRegisteredEvent);
	}

@wind57
Copy link
Contributor

wind57 commented Feb 22, 2024

hello and thank you for raising this issue, let's see if I can understand it completely.

What you want is when you deploy discoveryserver, is to have a proper readiness probe in place. Let me make it may be even simpler, to see if we are on the same page:

  • I deploy some manifests in a cluster, specifically the ones from the documentation (please notice that I have removed the pods on purpose)
---
apiVersion: v1
kind: List
items:
  - apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: spring-cloud-kubernetes-discoveryserver
      name: spring-cloud-kubernetes-discoveryserver
    spec:
      ports:
        - name: http
          port: 80
          targetPort: 8761
      selector:
        app: spring-cloud-kubernetes-discoveryserver
      type: ClusterIP
  - apiVersion: v1
    kind: ServiceAccount
    metadata:
      labels:
        app: spring-cloud-kubernetes-discoveryserver
      name: spring-cloud-kubernetes-discoveryserver
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      labels:
        app: spring-cloud-kubernetes-discoveryserver
      name: spring-cloud-kubernetes-discoveryserver:view
    roleRef:
      kind: Role
      apiGroup: rbac.authorization.k8s.io
      name: namespace-reader
    subjects:
      - kind: ServiceAccount
        name: spring-cloud-kubernetes-discoveryserver
  - apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: default
      name: namespace-reader
    rules:
      - apiGroups: ["", "extensions", "apps"]
        resources: ["services", "endpoints"]
        verbs: ["get", "list", "watch"]
  - apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: spring-cloud-kubernetes-discoveryserver-deployment
    spec:
      selector:
        matchLabels:
          app: spring-cloud-kubernetes-discoveryserver
      template:
        metadata:
          labels:
            app: spring-cloud-kubernetes-discoveryserver
        spec:
          serviceAccountName: spring-cloud-kubernetes-discoveryserver
          containers:
          - name: spring-cloud-kubernetes-discoveryserver
            image: springcloud/spring-cloud-kubernetes-discoveryserver:3.1.1-SNAPSHOT
            imagePullPolicy: IfNotPresent
            readinessProbe:
              httpGet:
                port: 8761
                path: /actuator/health/readiness
            livenessProbe:
              httpGet:
                port: 8761
                path: /actuator/health/liveness
            ports:
            - containerPort: 8761
  • If I do a kubectl get pods -o wide, find the IP of the pod in question and then issue a : curl 10.244.0.5:8761/actuator/health, I am going to see UP, though there are errors in logs related to permissions.

Now, there are a couple of things here.

  • The first one, the thing you are talking about with RegisteredEventSource is kind of irrelevant here. It is relevant that you see it in logs, but you only see it there because this comes as a dependency that discovery-server uses. It does not hurt in the happy path scenario (without any errors), but unfortunately, it does muddy the waters in this case. This is easy to disable (and does not mess with anything to my knowledge of the current code):
            env:
              - name: SPRING_CLOUD_DISCOVERY_CLIENT_HEALTHINDICATOR_ENABLED
                value: "FALSE"

(this can be added to the documentation and that sample you recently worked on).

  • The second one is that we might want to add a unit-test that covers this exact scenario that you have here.

  • The third one, is that this is a bug: we delegate to the same code in a few places. There is a piece of code that says: "if there is an Exception, capture it, log it, return null". This all makes sense except for this use-case that you present here.

To me, this is indeed a bug, but one that can be fixed.

@ryanjbaxter if you agree, please add the proper label and I will present a fix soon.

@ryanjbaxter
Copy link
Contributor

So the issue is without the pod resource permissions when the instance registered event is fired we get the permissions issue?

@codefromthecrypt
Copy link
Contributor Author

thanks for the analysis. I think there are a lot of projects who silently (like not log at WARN level) handle errors that don't affect functionality. If there's a way to make this error not visible I wouldn't notice. For now, the yaml adds the permission only needed to make this log warning go away. Any way to roll back that permission and also by default not have to do anything special to have a clean boot is positive!

@codefromthecrypt
Copy link
Contributor Author

So the issue is without the pod resource permissions when the instance registered event is fired we get the permissions issue?

yep exactly this. several stack traces, but the actual data isn't used.

@ryanjbaxter
Copy link
Contributor

I guess I am curious as to why we would publish an instance registered event when starting the discovery server, something to look at.

anyways we should not be producing the error so I agree it’s something we should address

@codefromthecrypt
Copy link
Contributor Author

I guess I am curious as to why we would publish an instance registered event when starting the discovery server, something to look at.

I went down the rabbit hole on this, and there are consumes of this but only in tests. Maybe someone added it for a custom server 🤷

@wind57
Copy link
Contributor

wind57 commented Feb 23, 2024

So the issue is without the pod resource permissions when the instance registered event is fired we get the permissions issue?

That is one problem, yes. We do not need such an event in case of discovery-server, so the first fix is to disable it from being produced.

The second problem, is how we treat a "failure" in the health indicator, that is the actual bug that needs to be addressed. (I hope it will all make sense in the PR I am working on)

I went down the rabbit hole on this, and there are consumes of this but only in tests. Maybe someone added it for a custom server 🤷

As said, this comes from a dependency that we use. Now, that dependency might be included in other source code, that in turn could have spring-cloud-commons in its classpath and that is when InstanceRegisteredEvent matters. I don't want to burden you with the details, but to make it simpler: this InstanceRegisteredEvent is needed, just not in discovery-server.

@codefromthecrypt
Copy link
Contributor Author

another glitch you can see on this is that it is not using the bean for the api client, which means it isn't instrumented or configured. this leads to visibility gaps also. I really think this should be disabled by default as the benefit is too small. Something else can enable it and accept the problems it has.

my 2p

@codefromthecrypt
Copy link
Contributor Author

@wind57 I happened upon this type K8sPodLabelsAndAnnotationsSupplier which appears to read pod apis. I haven't traced it yet, but maybe this ends up requiring pod perms, too?

@ryanjbaxter ryanjbaxter linked a pull request Mar 8, 2024 that will close this issue
@ryanjbaxter ryanjbaxter added this to the 3.1.1 milestone Mar 8, 2024
ryanjbaxter pushed a commit that referenced this issue Mar 8, 2024
@github-project-automation github-project-automation bot moved this to Done in 2023.0.1 Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants