fix: tlspolicy enforced condition when certificate/issuer ready condition is missing #715

KevFan · 2024-06-26T11:20:12Z

Description

Closes: #704

fix: tlspolicy enforced condition when certificate/issuer ready condition is missing
- Previous using IsStatusConditionFalse will return true if the Ready status condition is missing which caused TLSPolicy to incorrectly set the Enforced condition to true
feat: TLSPolicy watch for owned Certificates
feat: watch for issuer/clusterIssuer status changes for TLSPolicy
- TLSPolicy should watch for status changes to ClusterIssuer/Issuer to react to whether the policy is correctly enforced or not. Also previously TLSPolicy integration tests are a bit flaky due to an certain order of events must happen / resources must be ready in order to pass the test
regression: do not wait for gateways to be ready for tlspolicy
- There is a regression issue where TLSPolicy should not wait for the gateways to be ready as the controller may be creating the neceesary resources in order for the gateway to be in the ready state
fix: get all gateways events for tlspolicy
- Previously the gateway event for TLSPolicy is filtered out if the gateway is not in a programmed state in the DAG, but is a valid use case for this policy. This caused the test TLSPolicy controller with multiple https listener [It] should delete tls certificate when listener is removed to be flaky. Added this filter option to the DAG to work as expected for TLSPolicy.

Verification

Passing tests should be sufficient but if you want to manually verrify:

Deploy this branch

make local-setup

Apply TLSPolicy

kubectl apply -n istio-system -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: istio-ingressgateway
spec:
  gatewayClassName: istio
  listeners:
  - name: http
    hostname: '*.toys.io'
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: toys
        kind: Secret
    allowedRoutes:
      namespaces:
        from: All    
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: kuadrant.io/v1alpha1
kind: TLSPolicy
metadata:
  name: gw-tls
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: istio-ingressgateway
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: selfsigned-issuer
EOF

Watch TLSPolicy status

watch -n 0.1 "kubectl get tlspolicy -A -o yaml | yq '.items[0]'"

Verified enforced status is:

    - lastTransitionTime: "2024-07-01T08:59:46Z"
      message: TLSPolicy has been successfully enforced
      reason: Enforced
      status: "True"
      type: Enforced

Delete certificate secret

kubectl delete secrets toys -n istio-system

Verify TLSPolicy enforced condition quickly tranistions to false and back to true

api/v1alpha1/dnspolicy_types.go

pkg/library/reconcilers/fetcher.go

mikenairn · 2024-07-01T09:56:52Z

Why is it necessary to filter out Gateways based on the "programmed" status here or here? I'm not familiar with DAG, or how it is being used, but would it not be easier for all events to make it to each policy controller and they decide what to do?

KevFan · 2024-07-01T11:34:46Z

Why is it necessary to filter out Gateways based on the "programmed" status here or here? I'm not familiar with DAG, or how it is being used, but would it not be easier for all events to make it each policy controller and they decide what to do?

The intention for filtering here, at least for RateLimit / Auth, is that it only cares about programmed gateways, since if it's in an unready state, it should not do any of the configuration since no traffic will be recieved anyway. The policy will be marked as Accepted: False from this error.

The DAG (a view of the cluster of things we care about, that was first used at the reconciler level) does the same filtering because of the same assumption that it should only care about programmed Gateways. But this filtering now also affects events since it was introduced at the gateway event mapper level from this change. I believe it's also with the same intention, no point to reconciling on policies on a gateway event where the Gateway is unready at least for RateLimit / Auth. But thinking on this, an event where the gateway goes from a Programmed: true -> Programmed: false state will be ignored, and all the policies under this gateway will have the incorrect statuses, so we probably should not ignore them at the event level 🤔

In summary I think it's necessary at least here to mark policies as Accepted: false for at least Auth & RateLimit (dont know if this should be the case for TLS and DNS) and probably shouldn't do this filtering at the event level.

Not sure does anyone else want to give an opinion on this though 🤔

mikenairn · 2024-07-01T11:50:39Z

But thinking on this, an event where the gateway goes from a Programmed: true -> Programmed: false state will be ignored, and all the policies under this gateway will have the incorrect statuses, so we probably should not ignore them at the event level 🤔

Exactly, i don't think any policy should be ignoring events, they should all go down to the policy controller and it decides what to do.

Any assumption made about only caring about gateways with a status "Programmed: true" is wrong anyway as we have at least one policy of our own that does care about Gateways not programmed (TLSPolicy) , and other future polices may do also. So could we just remove these checks?

@eguzki @guicassolato?

maleck13 · 2024-07-01T13:47:38Z

Tend to agree with @mikenairn each policy controller should decide what it wants to do with a given gateway state. TLS is a good example. A gateway wont get into a programmed state unless TLSPolicy has setup the secret for it

KevFan · 2024-07-03T09:59:08Z

@mikenairn @maleck13 Not sure about completely removing the checks since both checks are primarly used within the reconcilers anyway and there's more instances of wanting to filter them out than not 🤔 The changes in this PR allows passing an option defined by the policy type whether it should do this filtering or not, with the DAG defaulting to always perform this filtering unless otherwise specified because of this majority (can default to not filter if people prefer).

Otherwise I think completely removing the checks and doing a refactor to perform this at each controller level elsewhere seems a bit out of scope for this PR

eguzki · 2024-07-03T13:30:00Z

I am a bit out of context. But I will try to contribute a bit, hopefully not confusing things even more.

i don't think any policy should be ignoring events, they should all go down to the policy controller and it decides what to do.

I would not state like that. Policy controllers setup watchers, mappers and predicates to catch only relevant events for the resources (or subresources) they are reconciling. Ignoring not relevant events to save NO-OP reconcilliation triggers.

Any assumption made about only caring about gateways with a status "Programmed: true" is wrong anyway as we have at least one policy of our own that does care about Gateways not programmed (TLSPolicy) , and other future polices may do also. So could we just remove these checks?

I agree, but @KevFan implemented in a way you can opt out.

The changes in this PR allows passing an option defined by the policy type whether it should do this filtering or not, with the DAG defaulting to always perform this filtering unless otherwise specified because of this majority (can default to not filter if people prefer).

Have you considered having the topology component always add all the gateways to the graph and then have the topology index layer (the component wrapping the topology component), a new method called GetProgrammedGateways() that uses an precomputed index of programmed gateways?

Yet another option would be that each controller decides which gateways are feeded into the topology. Nothing stops you from doing a pre-filtering step to filter out not programmed gateways and feed the topology with only programmed gateways.

Between, your implemented option, the index layer and the pre-filtering approach, I would pick the index layer approach. But all three are good enough for me. The reason to pick the index layer is because it is somewhat weird to feed the topology with N gateways and when I requests gateways out of it I get fewer (or even none) instances of gateways.

KevFan · 2024-07-04T09:57:58Z

Between, your implemented option, the index layer and the pre-filtering approach, I would pick the index layer approach. But all three are good enough for me. The reason to pick the index layer is because it is somewhat weird to feed the topology with N gateways and when I requests gateways out of it I get fewer (or even none) instances of gateways.

Index layer approach sounds good to me, however I prefer the current implementation since it works with both the FetchTargetRefObject and the DAG. The other approaches seems a bit out of scope for this PR at least that focuses on mainly the TLSPolicy. I'd rather opt for a follow up PR for this if we want it since it has a greater affect on all the areas that uses the DAG 🤔

controllers/tlspolicy_controller.go

dlaw4608

Verification Steps worked as expected, The TLS policy is created and enforced correctly.

…tion is missing Previous using IsStatusConditionFalse will return true if the Ready status condition is missing which caused TLSPolicy to incorrectly set the Enforced condition to true

…oller

KevFan · 2024-07-08T13:45:35Z

@mikenairn are you okay with the changes otherwise or is there anything else outstanding? Would appreciate a review from you or someone more familiar with the TLSPolicy to ensure the changes makes sense 🤔

mikenairn

@mikenairn are you okay with the changes otherwise or is there anything else outstanding? Would appreciate a review from you or someone more familiar with the TLSPolicy to ensure the changes makes sense 🤔

Left a few comments/questions, but generally i think it looks fine. My initial questions were really around the need for the Programmed status checks at all, but if they are deemed necessary, then the approach here makes sense.

Have we tried it when cert manager isn't installed though?

api/v1alpha1/dnspolicy_types.go

pkg/library/reconcilers/fetcher.go

tests/common/tlspolicy/tlspolicy_controller_test.go

pkg/library/mappers/gateway.go

controllers/tlspolicy_controller.go

…ager is installed

KevFan · 2024-07-09T10:21:49Z

Have we tried it when cert manager isn't installed though?

I've tried it when cert manager isn't installed with the following steps:

Deploy cluster

make local-env-setup

Delete any of the cert manager CRDs

kubectl delete crd certificates.cert-manager.io
kubectl delete crd clusterissuers.cert-manager.io
kubectl delete crd issuers.cert-manager.io

CRDs can be re-installed using:

make install-cert-manager

Run operater

make run

TLS Policy should be disabled with a similar log line to:

2024-07-09T11:12:02+01:00       ERROR   kuadrant-operator.tlspolicy     CertManager CRD was not installed       {"group": "cert-manager.io", "kind": "Certificate", "version": "v1"}
github.com/kuadrant/kuadrant-operator/pkg/library/gatewayapi.IsCertManagerInstalled
        /Users/chfan/dev/kuadrant-operator/pkg/library/gatewayapi/utils.go:162
github.com/kuadrant/kuadrant-operator/controllers.(*TLSPolicyReconciler).SetupWithManager
        /Users/chfan/dev/kuadrant-operator/controllers/tlspolicy_controller.go:205
main.main
        /Users/chfan/dev/kuadrant-operator/main.go:212
runtime.main
        /Users/chfan/sdk/go1.21.10/src/runtime/proc.go:267
2024-07-09T11:12:02+01:00       INFO    kuadrant-operator.tlspolicy     TLSPolicy controller disabled. CertManager was not found

KevFan self-assigned this Jun 26, 2024

KevFan force-pushed the ssues/704 branch from be88fae to b716990 Compare June 27, 2024 15:44

KevFan mentioned this pull request Jun 28, 2024

Reconcile out of order gateway resource creation. #638

Merged

KevFan added kind/bug Something isn't working kind/enhancement New feature or request size/medium labels Jun 28, 2024

KevFan force-pushed the ssues/704 branch 2 times, most recently from 910eb98 to 21e0a1c Compare June 28, 2024 16:47

KevFan commented Jul 1, 2024

View reviewed changes

api/v1alpha1/dnspolicy_types.go Show resolved Hide resolved

KevFan commented Jul 1, 2024

View reviewed changes

pkg/library/reconcilers/fetcher.go Show resolved Hide resolved

KevFan marked this pull request as ready for review July 1, 2024 09:34

KevFan requested a review from a team as a code owner July 1, 2024 09:34

KevFan requested a review from mikenairn July 1, 2024 09:34

mikenairn mentioned this pull request Jul 4, 2024

remove cert-manager as dependency #727

Merged

eguzki mentioned this pull request Jul 4, 2024

[state-of-the-world] common policy status controller #730

Open

eguzki reviewed Jul 4, 2024

View reviewed changes

controllers/tlspolicy_controller.go Show resolved Hide resolved

dlaw4608 approved these changes Jul 4, 2024

View reviewed changes

KevFan added 6 commits July 8, 2024 14:40

fix: tlspolicy enforced condition when certificate/issuer ready condi…

75d02dc

…tion is missing Previous using IsStatusConditionFalse will return true if the Ready status condition is missing which caused TLSPolicy to incorrectly set the Enforced condition to true

feat: TLSPolicy watch for owned Certificates

af197e1

regression: do not wait for gateways to be ready for tlspolicy

4ea94de

feat: watch for issuer/clusterIssuer status changes for TLSPolicy

8d15814

tests: harden and cleanup tests

01bbed9

fix: get all gateways events for tlspolicy

4dbe491

KevFan added 3 commits July 8, 2024 14:40

tests: unit test for list error for issuer mapper

581dc01

refactor: do not only just received programmed gateway events

d831395

tls policy: check cert-manager API is available before starting contr…

6444c64

…oller

KevFan force-pushed the ssues/704 branch from fa9c760 to 6444c64 Compare July 8, 2024 13:40

mikenairn reviewed Jul 8, 2024

View reviewed changes

refactor: account for ClusterIssuer & Issuer for checking if Cert Man…

c5473d1

…ager is installed

KevFan requested a review from mikenairn July 10, 2024 14:18

mikenairn approved these changes Jul 10, 2024

View reviewed changes

KevFan merged commit f5116f2 into Kuadrant:main Jul 10, 2024
18 checks passed

eguzki mentioned this pull request Jul 19, 2024

enhanced gateway api topology #751

Merged

mikenairn mentioned this pull request Jul 26, 2024

tlspolicy controller reads many resources that it does not watch. #745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: tlspolicy enforced condition when certificate/issuer ready condition is missing #715

fix: tlspolicy enforced condition when certificate/issuer ready condition is missing #715

KevFan commented Jun 26, 2024 •

edited

Loading

mikenairn commented Jul 1, 2024 •

edited

Loading

KevFan commented Jul 1, 2024 •

edited

Loading

mikenairn commented Jul 1, 2024

maleck13 commented Jul 1, 2024

KevFan commented Jul 3, 2024

eguzki commented Jul 3, 2024

KevFan commented Jul 4, 2024

dlaw4608 left a comment

KevFan commented Jul 8, 2024

mikenairn left a comment

KevFan commented Jul 9, 2024

fix: tlspolicy enforced condition when certificate/issuer ready condition is missing #715

fix: tlspolicy enforced condition when certificate/issuer ready condition is missing #715

Conversation

KevFan commented Jun 26, 2024 • edited Loading

Description

Verification

mikenairn commented Jul 1, 2024 • edited Loading

KevFan commented Jul 1, 2024 • edited Loading

mikenairn commented Jul 1, 2024

maleck13 commented Jul 1, 2024

KevFan commented Jul 3, 2024

eguzki commented Jul 3, 2024

KevFan commented Jul 4, 2024

dlaw4608 left a comment

Choose a reason for hiding this comment

KevFan commented Jul 8, 2024

mikenairn left a comment

Choose a reason for hiding this comment

KevFan commented Jul 9, 2024

KevFan commented Jun 26, 2024 •

edited

Loading

mikenairn commented Jul 1, 2024 •

edited

Loading

KevFan commented Jul 1, 2024 •

edited

Loading