Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kustomization marked as ready when health check resource is not ready #387

Closed
makkes opened this issue Jul 5, 2021 · 14 comments
Closed
Assignees

Comments

@makkes
Copy link
Member

makkes commented Jul 5, 2021

I'm observing kustomize-controller marking a Kustomization resources as Ready under the following circumstances:

  1. The health checks for the Kustomization are configured like this:
    healthChecks:
      - apiVersion: helm.toolkit.fluxcd.io/v2beta1
        kind: HelmRelease
        name: any
        namespace: flux-system
    
  2. helm-controller hasn't started reconciling the any HelmRelease so that it has no status field. This is easy to reproduce by scaling the helm-controller deployment down to 0 replicas.

I've seen this happen on cluster with heavy load so that the helm-controller hasn't started reconciling the any HelmRelease, yet.

The reason for this is that kstatus reports the HelmRelease as Current, even though it is actually not.

This is problematic in the case where other Kustomizations depend on a Kustomization that has health checks for resources that expose a proper kstatus-compatible API (like status field, observedGeneration etc.) because the dependencies would be applied even though the health checks for the original Kustomization they depend on would get un-ready at a later stage (e.g. due to the HelmRelease not being ready, yet).

@makkes
Copy link
Member Author

makkes commented Jul 5, 2021

A little more context:

Since HelmRelease is not one of the explicitly supported types of kstatus, kstatus applies generic rules (see here for details). Since the HelmRelease in question does not have a status field set, yet (because the helm-controller hasn't touched it, yet), kstatus cannot know the status and infers a Current status by that.

@hiddeco
Copy link
Member

hiddeco commented Jul 5, 2021

The problem essentially boils down to the Ready condition only being set on the object right after creation, when the reconciler encounters it for the first time (which in your case is slowed down by a lot due to the cluster being busy).

Wondering if one way to "work around" this would be to use a mutating (admission) webhook that always sets this condition (Ready==False) for newly created resources where it applies, so that they can not exist without a negative status being reported for them by default.

@makkes makkes changed the title Kustomization marked as ready when health check resources is not ready Kustomization marked as ready when health check resource is not ready Jul 5, 2021
@stefanprodan
Copy link
Member

We could set a default value in the CRD for observed generation to something like -1.

makkes pushed a commit to makkes/helm-controller that referenced this issue Jul 7, 2021
This resolves an issue with kustomize-controller marking a
Kustomization as healthy even when the helm-controller hasn't even
looked at the HelmRelease targeted by the Kustomization's
healthChecks, yet. Setting `observedGeneration` to -1 by default will
cause kstatus to report a status of `InProgress` instead of `Ready`.

see fluxcd/kustomize-controller#387 for
details on the issues this is solving.

Signed-off-by: Max Jonas Werner <[email protected]>
@makkes
Copy link
Member Author

makkes commented Jul 7, 2021

We could set a default value in the CRD for observed generation to something like -1.

That's a great solution. I issued a PR on the helm-controller repo.

makkes pushed a commit to makkes/helm-controller that referenced this issue Jul 8, 2021
This resolves an issue with kustomize-controller marking a
Kustomization as healthy even when the helm-controller hasn't even
looked at the HelmRelease targeted by the Kustomization's
healthChecks, yet. Setting `observedGeneration` to -1 by default will
cause kstatus to report a status of `InProgress` instead of `Ready`.

see fluxcd/kustomize-controller#387 for
details on the issues this is solving.

Signed-off-by: Max Jonas Werner <[email protected]>
makkes pushed a commit to makkes/helm-controller that referenced this issue Jul 29, 2021
This resolves an issue with kustomize-controller marking a
Kustomization as healthy even when the helm-controller hasn't even
looked at the HelmRelease targeted by the Kustomization's
healthChecks, yet. Setting `observedGeneration` to -1 by default will
cause kstatus to report a status of `InProgress` instead of `Ready`.

see fluxcd/kustomize-controller#387 for
details on the issues this is solving.

Signed-off-by: Max Jonas Werner <[email protected]>
hiddeco pushed a commit to makkes/helm-controller that referenced this issue Aug 5, 2021
This resolves an issue with kustomize-controller marking a
Kustomization as healthy even when the helm-controller hasn't even
looked at the HelmRelease targeted by the Kustomization's
healthChecks, yet. Setting `observedGeneration` to -1 by default will
cause kstatus to report a status of `InProgress` instead of `Ready`.

see fluxcd/kustomize-controller#387 for
details on the issues this is solving.

Signed-off-by: Max Jonas Werner <[email protected]>
@steven-sheehy
Copy link

Is this considered complete now that helm-controller fix for observed generation is released or is there more to do?

@hiddeco
Copy link
Member

hiddeco commented Aug 6, 2021

It would be wise to make the same patch to all our controllers that have an observedGeneration, but for the HelmRelease itself the issue should be resolved.

@makkes
Copy link
Member Author

makkes commented Aug 7, 2021

It would be wise to make the same patch to all our controllers that have an observedGeneration, but for the HelmRelease itself the issue should be resolved.

I'll do that on Monday if that's ok for you!?

@hiddeco
Copy link
Member

hiddeco commented Aug 7, 2021

Would be splendid @makkes, thanks 😊

@makkes
Copy link
Member Author

makkes commented Aug 8, 2021

It would be wise to make the same patch to all our controllers that have an observedGeneration, but for the HelmRelease itself the issue should be resolved.

@stefanprodan
Copy link
Member

@makkes is there anything left or can we close this?

@makkes
Copy link
Member Author

makkes commented Oct 26, 2021

If we want to apply the setting of -1 in the observedGeneration to every Flux CRD, then we still need to patch

  • image-automation-controller (ImageUpdateAutomation)
  • image-reflector-controller (ImageRepository, ImagePolicy)
  • source-controller (HelmRepository, HelmChart, GitRepository, Bucket)

The immediate issue is resolved, though so I created a follow-up issue and will close this one.

@makkes makkes closed this as completed Oct 26, 2021
@squaremo
Copy link
Member

It seems like this is going to be a problem for every CRD, and possible some other built-in types that don't get special treatment by kstatus. Could it be better addressed upstream? E.g., by kstatus treating observedGeneration of 0 as unready.

@squaremo
Copy link
Member

kstatus treating observedGeneration of 0 as unready.

I understand this better now, having gone to look at the kstatus code. My suggestion above won't help, because .status.observedGeneration needs to be present in the JSON representation (unstructured.Unstructured) of an object for kstatus to take it into account (https://github.com/kubernetes-sigs/cli-utils/blob/master/pkg/kstatus/status/generic.go#L73). Most CRDs (in fluxcd/, but also e.g., in cluster-api) omit .status and .status.observedGeneration when those are empty; and in general, objects aren't serialised from Go values anyway.

@makkes
Copy link
Member Author

makkes commented Jan 28, 2022

Thanks for looking into this more closely, @squaremo. I also spent a considerable amount wading through kstatus code last year and still believe this should eventually be fixed upstream. I'll see if there's anything we can do about it as IMHO the solution we came up with here of setting observedGeneration to -1 by default feels hacky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants