-
Notifications
You must be signed in to change notification settings - Fork 524
NE-2183: Openshift conditions on Gateway API status #1871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@rikatz: This pull request references NE-2183 which is a valid jira issue. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@rikatz: This pull request references NE-2183 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign |
|
/assign |
|
/retest |
|
the error is valid, it is due to the metadata not containing real approvers and reviewers for now. Once I get some review and approval I can fix it |
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Outdated
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Outdated
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Outdated
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
|
Forgot to add a description to the review. I was about to look at openshift/cluster-ingress-operator#1294 but then recalled that there is an EP. So I started from the EP. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| * Adding these conditions to user-managed Gateway resources outside the | ||
| `openshift-ingress` namespace | ||
| * Modifying or changing existing IngressController condition behavior or semantics | ||
| * Introducing custom condition types beyond DNS and LoadBalancer at this time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this means we don't mark IngressController as Degraded if there are problems with the Gateway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right, this proposal is just about adding conditions to the Gateway resource, not changing anything else
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| IngressController by: | ||
| 1. Creating a shared `pkg/resources/status` package with condition computation | ||
| functions | ||
| 2. Refactoring existing IngressController status code to use this shared package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we mark the IngressController as Degraded if DNSReady and/or LoadBalancerReady are false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this means that we are moving the conditions function previously used by IngressController only to a common place that can also be reused by Gateway API. It is about the condition calculation functions (given DNSRecord, LoadBalancer service, etc what should be the Gateway resource conditions) but we don't touch IngressController behavior
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| 3. Cloud Provider API provisions the LoadBalancer successfully | ||
| 4. LoadBalancer service status is updated with external IP/hostname | ||
| 5. Cluster Ingress Operator detects the Gateway resource and begins reconciliation | ||
| 6. Cluster Ingress Operator initiates DNS record provisioning through its own dns controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is a Gateway API dns record creater controller alongside the cluster ingress one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, and it is on Cluster Ingress Operator. Am I missing something more explicit here? (like the package name?)
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
|
@rikatz: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| zone, quota exceeded, provider API error) | ||
| 4. Cluster Ingress Operator DNS controller reports failure status in the | ||
| DNSRecord resource | ||
| 5. Gateway Status Controller updates Gateway condition `DNSManaged=True` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason DNSManaged=False? Maybe reserved for the future, if another DNS management system is selected, such as ExternalDNS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From CIO code:
// In case there is no managed DNS zone configured, return a single condition
// that DNSManaged=False because no zone is configured
any other specific cases (like IngressController endpointpublishingstrategy) are not verified by Gateway API, so we intentionally skip it.
But managed can yes, be false in case the DNSConfig doesn't specify a proper public or private zone, even on Gateway API
| 5. Gateway Status Controller updates Gateway condition `DNSManaged=True` | ||
| (DNS should be managed, configuration is correct) | ||
| 6. Gateway Status Controller updates Gateway condition `DNSReady=False` with | ||
| reason `FailedZones` and detailed error message from DNS provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| reason `FailedZones` and detailed error message from DNS provider | |
| reason and error message as detailed in the section on Implementation Details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum, I think it is fine to keep it here. The implementation details are more about "How" we will implement, but it does make sense IMO keeping the conditions that will be used on the failure flow.
| * DNS conditions apply regardless of platform if DNS records are being managed | ||
|
|
||
| **MicroShift:** | ||
| * MicroShift typically does not use Gateway API or cloud LoadBalancer services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember hearing about a MicroShift integration, handled by MicroShift, so maybe remove the first line?
| * No impact on MicroShift resource consumption or configuration | ||
|
|
||
| **Resource Impact:** | ||
| * Minimal CPU/memory impact: only adds condition updates during reconciliation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't watch DNS records? During reconciliation of what object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it watches, but these watches are negligible from a resource impact perspective IMO. I am not sure it is worth mentioning it here (also this is related to SNO deployments, so it is not different from other resource impacts considered above)
|
|
||
| **Resource Impact:** | ||
| * Minimal CPU/memory impact: only adds condition updates during reconciliation | ||
| * No additional controllers or processes required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a new gateway-status controller mentioned below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to "* A new gateway-status controller is created on existing Cluster Ingress Operator"
| namespace that have the OpenShift Gateway Class as their `.spec.gatewayClassName` controller | ||
| * Associated DNSRecord and Service resources are discovered using the | ||
| `gateway.networking.k8s.io/gateway-name` label | ||
| * Only the first matching DNSRecord and Service in the same namespace are used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind only the first matching DNSRecord being used? Why not check all DNSRecords for the Gateway and report if one or more are in a failure status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well:
- For services we know that Istio will provision just one Service (of type LoadBalancer) for the Gateway
- For DNS, it is following the same logics from CIO, that receives the "wildcard DNS record" only. Maybe this assumption is wrong for Gateway API, and we should compute the DNS record from all of the provisioned DNS Records (we do watch all of the DNS Records related to the Gateway).
I will fix the EP here, as we need to watch all the DNSRecords from the Gateway, good catch!
|
|
||
| *DNSReady Condition:* | ||
| * Set to `Unknown` when DNSManagementPolicy is `Unmanaged` (OpenShift doesn't manage DNS, so status is unknown) | ||
| * Set to `False` with reason `RecordNotFound` when the associated DNSRecord resource cannot be found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this more precise?
| * Set to `False` with reason `RecordNotFound` when the associated DNSRecord resource cannot be found | |
| * Set to `False` with reason `RecordNotFound` when one or more of the associated DNSRecord resources cannot be found |
|
|
||
| **Condition Lifecycle:** | ||
| * Conditions are added when a Gateway is reconciled in the `openshift-ingress` namespace | ||
| * Conditions are updated in-place using `condutils.SetStatusCondition()` to preserve transition times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why preserve transition times? Transition time should be updated if the condition changes, at least if it changes from true to false or vice versa.
| * Maximum of 8 total conditions are maintained per Gateway to prevent unbounded growth | ||
|
|
||
| **Permissions:** | ||
| * The cluster-ingress-operator service account is granted RBAC permissions to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see it say:
| * The cluster-ingress-operator service account is granted RBAC permissions to: | |
| * The cluster-ingress-operator service account uses existing RBAC permissions to: |
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| * No additional controllers or processes required | ||
| * Negligible increase in etcd storage for condition status (~1KB per Gateway) | ||
|
|
||
| ### Implementation Details/Notes/Constraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more implementation detail - how about making sure the status gets added to the must-gather troubleshooting document?
| **Not applicable to all environments:** | ||
| * The LoadBalancer condition is only meaningful on cloud platforms or platforms with | ||
| `LoadBalancer` support. | ||
| * Users on bare metal may see persistent `False` or `Unknown` status which could |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure that either the messages are clearly indicating that status isn't supported (e.g. "Bare metal clusters don't measure Gateway status"), or that they are at worst Unknown, not False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, what we discussed so far is that as Load balancers are not supported on Bare metal, but we also "do not provide" the Load Balancer managed condition on Gateway API, we will simply feed the "LoadBalancerReady" condition. This condition will reflect the current behavior of CCM / Baremetal provisioner, so let's say you are on baremetal:
- If you have metallb, it will work fine
- If you don't have a Load balancer controller, the status of the LoadBalancerReady condition will be false and the reason will be the LoadBalancer is pending, which means you don't have a LoadBalancer controller on your environment, and reflects the same behavior of CIO.
IMO as we don't have a clear definition yet on bare metal loadbalancer, I think this is the most meaningful information we can provide to users without being misleading, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also it should be considered that this is the "Drawbacks" section, meaning this may be a known drawback)
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| * Test `ComputeGatewayAPIDNSStatus` wrapper correctly converts internal conditions to Gateway API conditions | ||
| * Test `ComputeGatewayAPILoadBalancerStatus` wrapper correctly converts internal conditions to Gateway API conditions | ||
| * Test condition computation with DNSManagementPolicy set to Managed vs Unmanaged | ||
| * Test ObservedGeneration is correctly set on conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also: Test transition times?
enhancements/ingress/add-dns-and-loadbalancer-conditions-to-managed-gateway.md
Show resolved
Hide resolved
| * On the same test, verify the condition count is consistent with Istio and Openshift | ||
| added conditions | ||
| * Create Gateway out of `openshift-ingress` and verify that no Openshift condition is added | ||
| * Create Gateway with wrong DNS Domain and verify that Openshift conditions reflect the failue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Create Gateway with multiple listeners, only one of which can get a successful DNS record. I expect the dns ready status to be False.
| * No CSI, CRI, or CNI changes are involved | ||
|
|
||
| **Compatibility:** | ||
| * Feature works with Gateway API v1 (both support custom conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What does the "both" refer to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
halucination, apparently. Let me remove the word
| * Negligible impact on API throughput: condition updates happen during normal reconciliation | ||
| * No new API calls introduced; only status updates to existing Gateway resources | ||
| * Expected number of managed Gateways in `openshift-ingress` namespace: typically 1-10 per cluster | ||
| * Condition updates are rate-limited to prevent excessive writes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rate-limiting is automatic, no coding needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it is part of the maximum reconciliation that we define, and client-go throttling (or it should).
I am not sure where claude got this, so I would be happy to also remove this line if we feel it may be misleading.
|
|
||
| **Detecting Issues:** | ||
|
|
||
| *Symptom: Gateway conditions show `DNSManaged=False`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessarily an error condition. Some users may choose to not have DNS be managed.
|
@rikatz Overall I think this looks great. Do you think we should use the generator for other EP/KEP/GEPs? I have a few nits and questions, and one major question: can you discuss the decision not to propagate condition status up to the ingress controller status? I can see pros and cons for both, but we should document that decision. Thanks. |
This enhancement proposal adds the Ingress Controller Conditions (LoadBalancerManaged, LoadBalancerReady, DNSManaged and DNSReady) to Gateway API resources that created with Openshift Gateway Class and on openshift-ingress namespace.
This proposal was partially generated with the help of Claude/AI